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Preface 


This textbook is intended for an introductory followed by an advanced course in 
linear algebra, with emphasis on its interactions with other topics in mathematics, 
such as calculus, geometry, and combinatorics. We took a straightforward path to 
the most important topic, linear maps between vector spaces, most of the time finite 
dimensional. However, since these concepts are fairly abstract and not necessarily 
natural at first sight, we included a few chapters with explicit examples of vector 
spaces such as the standard n-dimensional vector space over a field and spaces of 
matrices. We believe that it is fundamental for the student to be very familiar with 
these spaces before dealing with more abstract theory. In order to maximize the 
clarity of the concepts discussed, we included a rather lengthy chapter on 2 x 2 
matrices and their applications, including the theory of Pell’s equations. This will 
help the student manipulate matrices and vectors in a concrete way before delving 
into the abstract and very powerful approach to linear algebra through the study of 
vector spaces and linear maps. 

The first few chapters deal with elementary properties of vectors and matrices 
and the basic operations that one can perform on them. A special emphasis is 
placed on the Gaussian Reduction algorithm and its applications. This algorithm 
provides efficient ways of computing some of the objects that appear naturally in 
abstract linear algebra such as kernels and images of linear maps, dimensions of 
vector spaces, and solutions to linear systems of equation. A student mastering 
this algorithm and its applications will therefore have a much better chance of 
understanding many of the key notions and results introduced in subsequent 
chapters. 

The bulk of the book contains a comprehensive study of vector spaces and linear 
maps between them. We introduce and develop the necessary tools along the way, 
by discussing the many examples and problems proposed to the student. We offer a 
thorough exposition of central concepts in linear algebra through a problem-based 
approach. This is more challenging for the students, since they have to spend time 
trying to solve the proposed problems after reading and digesting the theoretical 
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material. In order to assist with the comprehension of the material, we provided 
solutions to all problems posed in the theoretical part. On the other hand, at the 
end of each chapter, the student will find a rather long list of proposed problems, 
for which no solution is offered. This is because they are similar to the problems 
discussed in the theoretical part and thus should not cause difficulties to a reader 
who understood the theory. 

We truly hope that you will have a wonderful experience in your linear algebra 
journey. 


Richardson, TX, USA Titu Andreescu 
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Chapter 1 
Matrix Algebra 


Abstract This chapter deals with matrices and the basic operations associated with 
them in a concrete way, paving the path to a more advanced study in later chapters. 
The emphasis is on special types of matrices and their stability under the described 
operations. 


Keywords Matrices * Operations ° Invertible * Transpose ° Orthogonal 
e Symmetric matrices 


Before dealing with the abstract setup of vector spaces and linear maps between 
them, we find it convenient to discuss some properties of matrices. Matrices are a 
very handy way of describing linear phenomena while being very concrete objects. 
The goal of this chapter is to define these objects as well as some basic operations 
on them. 

Roughly, a matrix is a collection of “numbers” displayed in some rectangular 
board. We call these “numbers” the entries of the matrix. Very often, these “num- 
bers” are simply rational, real, or more generally complex numbers. However, these 
choices are not always adapted to our needs: in combinatorics and computer science, 
one works very often with matrices whose entries are residue classes of integers 
modulo prime numbers (especially modulo 2 in computer science), while other 
areas of mathematics work with matrices whose entries are polynomials, rational 
functions, or more generally continuous, differentiable, or integrable functions. 
There are rules allowing to add and multiply matrices (if suitable conditions on the 
size of the matrices are satisfied), if the set containing the entries of these matrices 
is stable under these operations. Fields are algebraic structures specially designed to 
have such properties (and more... ), and from this point of view they are excellent 
choices for the sets containing the entries of the matrices we want to study. 

The theory of fields is extremely beautiful and one can write a whole series of 
books on it. Even the basics can be fairly difficult to digest by a reader without some 
serious abstract algebra prerequisites. However, the purpose of this introductory 
book is not to deal with subtleties related to the theory of fields, so we decided 
to take the following rather pragmatic approach: we will only work with a very 
explicit set of fields in this book (we will say which ones in the next paragraphs), so 
the reader not familiar with abstract algebra will not need to know the subtleties of 
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the theory of fields in the sequel. Of course, the reader familiar with this theory will 
realize that all the general results described in this book work over general fields. 

In most introductory books of linear algebra, one works exclusively over the 
fields R and C of real numbers and complex numbers, respectively. They are indeed 
sufficient for essentially all applications of matrices to analysis and geometry, but 
they are not sufficient for some interesting applications in computer science and 
combinatorics. We will introduce one more field that will be used from time to time 
in this book. This is the field F, with two elements 0 and 1. It is endowed with 
addition and multiplication rules as follows: 


0+0=0, 041=1+0=1, 14+1=0 


and 


We do not limit ourselves exclusively to R and C since a certain number of issues 
arise from time to time when working with general fields, and this field F, allows 
us to make a series of remarks about this issues. From this point of view, one can 
see F, as a test object for some subtle issues arising in linear algebra over general 
fields. 

Important convention: in the remainder of this book, we will work exclu- 
sively with one of the following fields: 


¢ the field Q of rational numbers 

e the field R of real numbers. 

e the field C of complex numbers. 

¢ The field with two elements F, with addition and multiplication rules 
described as above. 


We will assume familiarity with each of the sets Q, R and C as well as the 
basic operations that can be done with rational, real, or complex numbers (such as 
addition, multiplication, or division by nonzero numbers). 

We will reserve the letter F for one of these fields (if we do not want to 
specify which one of the previous fields we are working with, we will simply say 
“Let F bea field”). 

The even more pragmatic reader can take an even more practical approach and 
simply assume that F will stand for R or C in the sequel. 
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Consider a field F. Its elements will be called scalars. 


Definition 1.1. Let n be a positive integer. We denote by F” the set of n-tuples 
of elements of F. The elements of F” are called vectors and are denoted either in 


row-form X = (x,...,X,) or in column-form 
x) 
X2 
X = 
Xn 


The scalar x; is called the ith coordinate of X (be it written in row or column form). 


The previous definition requires quite a few clarifications. First of all, note that 
if we want to be completely precise we should call an element of F” an n-vector 
or n-dimensional vector, to make it apparent that it lives in a set which depends 
on n. This would make a lot of statements fairly cumbersome, so we simply call the 
elements of F” vectors, without any reference to n. So (1) is a vector in F1, while 
(1, 2) is a vector in F 2. There is no relation whatsoever between the two exhibited 
vectors, as they live in completely different sets a priori. 

While the abuse of notation discussed in the previous paragraph is rather easy 
to understand and accept, the convention about writing vectors either in row or in 
column form seems strange at first sight. It is easily understood once we introduce 
matrices and basic operations on them, as well as the link between matrices and 
vectors, so we advise the reader to take it simply as a convention for now and make 

vı 


v2 
no distinction between the vector (v1, . . . , Vn) andthe vector | , |. We will see later 


Vn 

on that from the point of view of linear algebra the column notation is more useful. 

The zero vector in F” is denoted simply 0 and it is the vector whose coordinates 
are all equal to 0. Note that the notation 0 is again slightly abusive, since it does 
not make apparent the dependency on n: the O vector in F? is definitely not the 
same object as the zero vector in F?. However, this will (hopefully) not create any 
confusion, since in the sequel the context will always make it clear which zero vector 
we consider. 


Definition 1.2. Let m,n be positive integers. An m x n matrix with entries in F 
is a rectangular array 
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A11 412 ... Ain 
a21 A22 ... An 


Amı Am2 +++ Amn 
The scalar a;; € F is called the (i, j)-entry of A. The column-vector 


aij 


mj 
is called the jth column of A and the row-vector 
L; = [ai 4j2,..-, ain] 


is called the ith row of A. We denote by Mm,n (F) the set of all m x n matrices with 
entries in F. 


Definition 1.3. A square matrix of order n with entries in F is a matrix A € 
Mn. n(F). We denote by M,,(F) the set Mn n(F) of square matrices of order n. 


We can already give an explanation for our choice of denoting vectors in two 
different ways: am x n matrix can be seen as a family of vectors, namely its rows. 
But it can also be seen as a family of vectors given by its columns. It is rather natural 
to denote rows of A in row-form and columns of A in column-form. Note that a row- 
vector in F” can be thought of as a 1 x n matrix, while a column-vector in F” can 
be thought of as an x 1 matrix. From now on, whenever we write a vector as a row 
vector, we think of it as a matrix with one row, while when we write it in column 
form, we think of it as a matrix with one column. 


Remark 1.4. If Fi C F are fields, then we have a natural inclusion Mm „(Fı) C 
Minn(F): any matrix with entries in F; is naturally a matrix with entries in F. For 
instance the inclusions Q C R C C, induce inclusions of the corresponding sets of 
matrices, 1.e. 


Min.n(Q) C Mmana R) C Minn (©). 


Whenever it is convenient, matrices in Mm n(F) will be denoted symbolically 
by capital letters A, B,C,... or by [aij], [bij]. [cij],... where aij, bij, Cij,... 
respectively, represent the entries of the matrices. 


Example 1.5. a) The matrix [a;;] € M23(Q), where a;; = i? + j is given by 


ae 234 l 
567 
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b) The matrix 


1234 
2345 
3456 
4567 


can also be written as the matrix A = [a;;] € M4(Q) witha;; =i + j —1. 


Remark 1.6. Two matrices A = [a;;] and B = [b;;] are equal if and only if they 
have the same size (i.e., the same number of columns and rows) and a;; = b;; for 
all pairs (i, 7). 


A certain number of matrices will appear rather constantly throughout the book 
and we would like to make a list of them. First of all, we have the zero m xn matrix, 
that is the matrix all of whose entries are equal to 0. Equivalently, it is the matrix 
all of whose rows are the zero vector in F”, or the matrix all of whose columns are 
the zero vector in F”. This matrix is denoted O,,,,, or, if the context is clear, simply 
0 (in this case, the context will make it clear that 0 is the zero matrix and not the 
element 0 € F). 

Another extremely important matrix is the unit (or identity) matrix J, € 
M,,(F), defined by 


10...0 
01...0 
00...1 
with entries 
3, = lifi= j 
Y loifi#j 


Among the special but important classes of matrices that we will have to deal 
with quite often in the sequel, we mention: 


e The diagonal matrices. These are square matrices A = [a;;] such that a;; = 0 
unless i = j. The typical shape of a diagonal matrix is therefore 


dı 0...0 
Oa... 0 


0 0 ...a 


6 1 Matrix Algebra 


e The upper-triangular matrices. These are square matrices A = [a;;] whose 
entries below the main diagonal are zero, that is a;; = 0 whenever i > j. Hence 
the typical shape of an upper-triangular matrix is 


ati Aj2... Ain 
0 a22 ... Arn 


A= 
O O ... ann 
Of course, one can also define lower-triangular matrices as those square matrices 
whose entries above the main diagonal are zero. 


We will deal now with the basic operations on matrices. Two matrices of the 
same size m x n can be added together to produce another matrix of the same size. 
The addition is done component-wise. The re-scaling of a matrix by a scalar is done 
by multiplying each entry by that scalar. The obtained matrix has the same size as 
the original one. More formally: 


Definition 1.7. Let A = [a;;] and B = [b;;] be matrices in Mm,n(F) and let c € F 
be a scalar. 


a) The sum A + B of the matrices A and B is the matrix 


A+B= laij + bij]. 


In fully expanded form 
411 412 A13 ... Ain bii bi2 biz ... bin 
a21 A2 73... Arn by ba ba ... bon 
Am1 Am2 Am3 ... Amn bmi bm2 Dm3 tee binn 


ay t+ by an + by di3 +513... din + bin 
azn + bo, an +bn az + bz ... dan + ban 


amı + bmi Am2 + bm2 am3 + bing <- Amn + bmn 
b) The re-scaling of A by c is the matrix 
cA = [ca;;]. 


Remark 1.8. a) We insist on the fact that it does not make sense to add two 
matrices if they do not have the same size. 
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b) We also write —A instead of (—1)A, thus we write A — B instead of A + (—1)B, 
if A and B have the same size. 


Example 1.9. We have 


1100 0123 1223 
0110);+)1234)=]1344 
0011 2345 2356 
but 
1100 
0110]|+5 
0011 


does not make sense. 
As another example, we have 


1100 0111 1211 
O110;+]/0011])]=)] 0121 
0011 0001 0012 


in M3 4(R). 
On the other hand, we have the following equality in M3 4(F2) 


1100 0111 1011 
O110;/+]/0011);=)] 0101 
0011 0001 0010 


As we observed in the previous section, we can think of column-vectors in F” as 
n x l matrices, thus we can define addition and re-scaling for vectors by using the 
above definition for matrices. Explicitly, we have 


xy yı xı + yı 
X2 y2 X2 + y2 
AE a t a 2 
Xn Yn Xn + Yn 
and for ascalarc € F 
X1 CXI 
X2 CX2 
c = i 


a CXn 
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Similarly, we can define operations on row-vectors by thinking of them as matrices 
with only one row. 


Remark 1.10. a) Again, it makes sense to add two vectors if and only if they 
have the same number of coordinates. So it is nonsense to add a vector in F? 
and a vector in F°. 

b) Similarly, we let —X be the vector (—1)X and, if X,Y € F”, we let X — Y = 
X+(-Y). 


The following result follows from the basic properties of addition and multipli- 
cation rules in a field. We leave the formal proof to the reader. 


Proposition 1.11. For any matrices A, B,C € Mm n(F) and any scalars a, p € F 
we have 


(Al) (A+ B)+C = A+ (B + C) (associativity of the addition); 

(A2) A+ B = B + A (commutativity of the addition); 

(A3) A+ Omn = Onn + A = A (neutrality of Onn); 

(A4) A+ (—A) = (—A) + A = Onn (cancellation with the opposite matrix). 

(SI) (a+ B)A = &A + BA (distributivity of the re-scaling over scalar sums); 

(S2) a(A+ B) =aA+aB (distributivity of the re-scaling over matrix sums); 
(S3) a(BA) = (a@B)A (homogeneity of the scalar product); 

(S4) 1A = A (neutrality of 1). 


Since vectors in F” are the same thing as n x 1 matrices (or 1 x n matrices, 
according to our convention of representing vectors), the previous proposition 
implies that the properties (A1)-(A4) and (S1)-(S4) are also satisfied by vectors 
in F”. Of course, this can also be checked directly from the definitions. 


Definition 1.12. The canonical basis (or standard basis) of F” is the n-tuple of 
vectors (€),...,€,), where 


1 0 0 
0 1 0 
e=]|0|, &e=|0],..., vey nO 
0 0 1 


Thus e; is the vector in F” whose ith coordinate equals 1 and all other coordinates 
are equal to 0. 


Remark 1.13. Observe that the meaning of e; depends on the context. For example, 


if we think of e as the first standard basis vector in F? then e} = Fi but if we 


1 
think of it as the first standard basis vector in F? then e; = | 0 |. It is customary not 
0 
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to introduce extra notation to distinguish such situations but to rely on the context 
in deciding on the meaning of e;. 


The following result follows directly by unwinding definitions: 


Proposition 1.14. Any vector v € F” can be uniquely written as 


v = Xe} +X2@2 +... + Xnen 


for some scalars X\,...,Xn E€ F. In fact, xı, ..., X, are precisely the coordinates 
Of v. 
Proof. If x,,...,X, are scalars, then by definition 
Xx] 0 0 Xx] 
0 X2 0 X2 
xei + xer +... + Xen =| OF +] Of t...4] 9 | =] % 
0 0 Xn Xn 
The result follows. oO 


We have similar results for matrices: 


Definition 1.15. Let m,n be positive integers. For 1 <i < m andl < j <n 
consider the matrix Ej; € Mm n(F) whose (i, j )-entry equals 1 and all other entries 
are 0. 

The mn-tuple (Ei1,..., Ein, Ex1,..., Fon,.-.,Emi,---;Emn) is called the 
canonical basis (or standard basis) of Mm „(F). 


Proposition 1.16. Any matrix A E€ Mm ,(F) can be uniquely expressed as 


m n 


A = Yay By 


i=l j=l 


for some scalars a;;. In fact, aj; is the (i, j )-entry of A. 
Proof: As in the proof of Proposition 1.14, one checks that for any scalars x;; € F 


we have 


X11 X12 ... Xin 
m 


i X21 X22... 
Yo xy Ej; = P TEN ; > 
i=1 j=l oO ko Taoa 

Xml Xm2 +++ Xmn 


which yields the desired result. o 
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1100 
Example 1.17. Let us express the matrix A = | O 1 1 O | in terms of the canonical 


0022 
basis. We have 


A= Ey, + En + En + Ez + 2E33 + 2E34. 


1.1.1 Problems for Practice 


1. Write down explicitly the entries of the matrix A = [a;;] € M2,3(R) in each of 
the following cases: 


1 


a) aij => IFI 
b) dij =i +2j. 
c) aij = ij. 


2. For each of the following pairs of matrices (A, B) explain which of the matrices 
A + B and A — 2B make sense and compute these matrices whenever they do 
make sense: 


1200 1111 
a) A=| 0130] andB=] 0111 
0012 0021 
b) A=[1100]and B=[1 10]. 
3 10 -2 1 0 
c) A= | —1 —1 1 | and B =| 4 -11 
205 6 4 3 


3. Consider the vectors 


1 2 
—2 2 
y= 3 ; y= —1 
1 4 
4 3 


What are the coordinates of the vector vı + 2v2? 


3 1 0—4 
4. Express the matrix A = | 7—1 1—2 | in terms of the canonical basis of 
8 9 5-3 


M3 4(R). 


the matrix E11 — 3E12 + 4E23. 
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6. Let F be a field. 


a) Prove that if A, B € M,,(F) are diagonal matrices, then A + cB is a diagonal 
matrix for any c € F. 

b) Prove that the same result holds if we replace diagonal with upper-triangular. 

c) Prove that any matrix A € M,(F) can be written as the sum of an upper- 
triangular matrix and of a lower-triangular matrix. Is there a unique such 
writing? 

7. a) How many distinct matrices are there in My, »(F2)? 
b) How many of these matrices are diagonal? 
c) How many of these matrices are upper-triangular? 


1.2 Matrices as Linear Maps 


In this section we will explain how to see a matrix as a map on vectors. Let F be 
a field and let A € Mmn(F) be a matrix with entries aj;. To each vector X = 
xX] 


X2 
€ F” we associate a new vector AX € F” defined by 


11X1 + 412X2 +... + ayy Xp 

2X1 + A22X2 +... + d2nXn 
AX = 

Ami X1 + Am2X2 +... + AmnXn 


We obtain therefore a map F” — F™” which sends X to AX. 
Example 1.18. The map associated with the matrix 
1100 
A=|1110] € M34(R) 
0011 


is the map f : R* > R? defined by 


x 
HIRAI GEE 

Z 

t 


~na Y & 
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In terms of row-vectors we have 
F&YZH=(K+YxX+y+zz+1). 


Remark 1.19. Consider the canonical basis e1, ...,e„ of F”. Then by definition for 
alll<i<n 


Aj 
aj 
Aei = Ci = F , 

Ami 

x1 

. x2 . 
the ith column of A. In general, if X = | |. | € F” is any vector, then 
Xn 


AX = x10 + x202 +... + XnChn, 


as follows directly from the definition of AX. 
The key properties of this correspondence are summarized in the following: 


Theorem 1.20. For all matrices A, B E€ Mm n(F), all vectors X,Y € F” and all 
scalars a, B € F we have 


a) A(aX + BY) = aAX + BAY. 


b) (aA + BB)X =aAX + BBX. 
c) If AX = BX forall X € F", then A= B. 


x] yı 
is X2 y2 
Proof. Writing A = [aij], B = [bij], and X = . |,¥ =| . |, we have 
Xn Yn 
axı + By 
ax, + By2 
aA + BB = [aa;; + Bb] andaX + BY = i 
aX, + BYn 


a) By definition, the ith coordinate of A(aX + BY) is 


n n n 
X ay (@x; + Byj)=a X ax; +B X ayyy. 
j=1 j=1 j=1 
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The right-hand side is the ith coordinate of ~AX + BAY, giving the desired 
result. 
b) The argument is identical: the equality is equivalent to 


n n 


> @ay + pby)xj = 0D) ayx; + BD, by x; 
j=l 


j=l j=l 


which is clear. 

c) By hypothesis we have Ae; = Be;, where e1,..., en is the canonical basis of 
F”. Then Remark 1.19 shows that the ith column of A equals the ith column of 
B for 1 < i <n, which is enough to conclude that A = B. oO 


We obtain therefore an injective map A > (X > AX) from M,,,,(F) to the set 
of maps g : F” —> F” which satisfy 


g(aX + BY) =ag(X) + Y) 


for all X,Y € F” anda, 6B € F. Such a map ọ : F” —> F” is called linear. Note 
that a linear map necessarily satisfies (0) = 0 (take a = = 0 in the previous 
relation), hence this notion is different from the convention used in some other areas 
of mathematics (in linear algebra a map g(X) = aX + b is usually referred to as 
an affine map). 

The following result shows that we obtain all linear maps by the previous 
procedure: 


Theorem 1.21. Let g : F” — F™” be a linear map. There is a unique matrix 
A E€ Mmn(F) such that p(X) = AX forall X € F”. 


Proof. The uniqueness assertion is exactly part c) of the previous theorem, so let us 

focus on the existence issue. Let g : F” — F” bea linear map and let e),...,e, be 

the canonical basis of F”. Consider the matrix A whose ith column C; equals the 

vector y(e;) € F”. By Remark 1.19 we have Ae; = C; = g(e;) forall 1 <i <n. 
xX) 


X2 
IfX =] | | € F” isan arbitrary vector, then X = x,;e; +... + Xnen, thus since 


Xn 
gy is linear, we have 


Q(X) = (xie +... + Xn en) = x19(e1) +... + XnG(en) = 
xyC; +...+%,C, = AX, 


the last equality being again a consequence of Remark 1.19. Thus g(X) = AX for 
all X € F” and the theorem is proved. o 
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We obtain therefore a bijection between matrices in M,,,,(/') and linear 
maps F” > F”. 


Example 1.22. Let us consider the map f : R* —> R? defined by 
F(x, y,z,t) = (x — 2y +z, 2x — 3z + t,t — x). 


What is the matrix A € M3.4(R) corresponding to this linear map? By Remark 1.19, 
we must have f(e;) = C;, where ej, €2, 3, e4 is the canonical basis of Rt and 
C1, C2, C3, C4 are the successive columns of A. Thus, in order to find A, it suffices 
to compute the vectors f(e1),..., f(e4). We have 


F(e1) = f(1,0,0,0) = (1,2,—1), f(e2) = f(0, 1,0,0) = (—2,0,0), 
f(e3) = f(0,0, 1,0) = (1,-3,0), (es) = f(0,0,0, 1) = (0, 1, 1). 


Hence 
1 —2 10 
A= 0-31 
10 01 
In practice, one can avoid computing f (e1), ..., f (e4) as we did before: we look 


at the first coordinate of the vector f(x, y,z,t), that is x — 2y + z. We write it 
as 1- x + (-2)-y +1-z+0-¢f and this gives us the first row of A, namely 
[ 1-21 0]. Next, we look at the second coordinate of f(x, y,z,t) and write it as 
2-x+0-y-+ (—3)-z+1-t, which gives the second row [2 0-3 1] of A. We 
proceed similarly with the last row. 


1.2.1 Problems for Practice 


1. Describe the linear maps associated with the matrices 


a E Bee 
ive =24 2-325 


2. Consider the map f : R? > R4 defined by 
f(x, ¥,2) = (x — 2y + 2z, y =z + x, x,Z). 


Prove that f is linear and describe the matrix associated with f. 
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3. a) Consider the map f : R? —> R? defined by 
f(x,y) = @?, y’). 
Is this map linear? 


b) Answer the same question with the field R replaced with F2. 
4. Consider the map f : R? — R? defined by 


S(x,y) = (x +2y,x+y-1). 


Is the map f linear? 


1 -220 
5. Consider the matrix A = | 2 0 41 |. Describe the image of the vector v = 
-1 101 
1 
: through the linear map attached to A. 
2 


6. Give an example of a map f : R? > R which is not linear and for which 


flav) = af) 


for alla € R and all v € R°. 
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Let us consider now three positive integers m,n, p and A € M,,,(F), B € 
Mn,p(F). We insist on the fact that the number of columns 7 of A equals the 
number of rows n of B. We saw in the previous section that A and B define natural 
maps 


ga: F” > F”, op: F? > F”, 


sending X € F” to AX e F” and Y € F” to BY e F”. 
Let us consider the composite map 


pa ogg: F?” > F”, (p40gB)(X)= p4(9g8(X)). 


Since 4 and @z are linear, it is not difficult to see that g4 ° øg is also linear. Thus 
by Theorem 1.21 there is a unique matrix C € Mm,p(F) such that 


PaO PB = PC. 


Let us summarize this discussion in the following fundamental: 
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Definition 1.23. The product of two matrices A € Mm n(F) and B € Mn.p(F) 
(such that the number of columns n of A equals the number of rows n of B) is the 
unique matrix AB € Mm,p(F) such that 


A(BX) = (AB)X 


for all X € F?. 


Remark 1.24. Here is a funny thing, which shows that the theory developed so far is 
coherent: consider a matrix A E€ Mm n(F) and a vector X € F”, written in column- 
form. As we said, we can think of X as a matrix with one column, i.e., a matrix 
X € M,,,(F). Then we can consider the product AX € My. (F). Identifying again 
Mm ı(F) with column-vectors of length m, i.e., with F”, AX becomes identified 
with AX, the image of X through the linear map canonically attached to A. In 
other words, when writing AX we can either think of the image of X through the 
canonical map attached to A (and we strongly encourage the reader to do so) or 
as the product of the matrix A and of a matrix in M, (F). The result is the same, 
modulo the natural identification between column-vectors and matrices with one 
column. 


The previous definition is a little bit abstract, so let us try to compute explicitly 
the entries of AB in terms of the entries a;; of A and b;; of B. Let e),...,e, 
be the canonical basis of F”. Then (AB)e; is the jth column of AB by 
Remark 1.19. Let Cı (A), ..., C(A) and C\(B),...,C,(B) be the columns of A 
and B respectively. Using again Remark 1.19, we can write 


A(Be;) = AC; (B) = bij C\(A) + b2; C2(A) +... + bnj Cn (A). 
Since by definition A(Be;) = (AB)e; = C;(AB), we obtain 
C; (AB) = bij Ci(A) + bo; Co(A) + ... + bnj Cr (A) (1.1) 
We conclude that 
(AB); = aibyj + ai2b2;j +... + dinbnj (1.2) 


and so we have established the following 


Theorem 1.25 (Product Rule). Let A = [aij] € Mnn(F) and B = [bj] € 
My, p(F). Then the (i, j)-entry of the matrix AB is 


(AB); = | airbrj. 
k=1 


Of course, one could also take the previous theorem as a definition of the product 
of two matrices. But it is definitely not apparent why one should define the product 
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in such a complicated way: for instance a very natural way of defining the product 
would be component-wise (i.e., the (i, j )-entry of the product should be the product 
of the (i, j)-entries in A and B), but this naive definition is not useful for the 
purposes of linear algebra. The key point to be kept in mind is that for the purposes 
of linear algebra (and not only), matrices should be thought of as linear maps, 
and the product should correspond to the composition of linear maps. 


Example 1.26. a) If A = ie | and B = EB | are matrices in M2 (F), 
a21 422 bzi bz 


then AB exists and 


AB = bee + ai2b2; a1 bi2 + a 
aby, + anb az1b12 + abx 


b) If 
411 412 
bii biz 
421 422 an p bao 
431 432 


then the product AB is defined and it is the 3 x 2 matrix 


aby, + ai2b21 Gy b12 + a12b22 
AB = | anbi + anba azbi2 + arb 
a3ıb11 + a32b21 a31b12 + a32b22 


The product BA is not defined since B € M3 (F) and A € M33(F). 
c) Considering 


1 —1 
A=|2 0 and sali 
-1 3 
we get 
2 1 
AB = | —2 4 
-8 1 
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Then, both products AB and BA are defined and we have 


00 00 
AB =| yo |= % and Bach ol 


This last example shows two important things: 


e multiplication of matrices (even in M2(F)) is not commutative, i.e., generally 
AB # BA when AB and BA both make sense (this is the case if A, B € M, (F), 
for instance). 

e There are nonzero matrices A, B whose product is 0: for instance in this example 
we have A Æ O2, B # Op, but AB = Op. 


Definition 1.27. Two matrices A, B € M,,(F) commute if 
AB = BA. 


One has to be very careful when using algebraic operations on matrices, since 
multiplication is not commutative in general. For instance, one uses quite often 
identities such as 


(a+b) =a? +2ab+b*, (a+ b)(a-—b)= a? -— b? 


for elements of a field F. Such identities are (in general) no longer true if a, b are 
matrices and they should be replaced by the following correct identities 


(A+ B} =4A°+AB+BA+B?’, (A+B)(A—B)= 4 -— AB + BA- BP’. 


We see that the previous identities (which hold for elements of a field) hold for A 
and B if and only if A and B commute. 

Matrix multiplication obeys many of the familiar arithmetical laws apart from 
the commutativity property. More precisely, we have the following: 


Proposition 1.28. Multiplication of matrices has the following properties 


1) Associativity: we have (AB)C = A(BC) for all matrices A € Mm n(F), B € 
Mn p(F), C € Mpqa(F). 

2) Compatibility with scalar multiplication: we have a(AB) = (a@A)B = A(«B) 
ifa € F, A € My y(F) and B € M,p(F) 

3) Distributivity with respect to addition: we have 


(A+ B)C =AC+BC if A,B € Mmna(F) and C € Mnp(F), 
and 


D(A+B)=DA+DB if A,B € Mpmn(F) and D € Mpm(F). 
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All these properties follow quite easily from Definition 1.23 or Theorem 1.25. Let 
us prove for instance the associativity property (which would be the most painful 
to check by bare hands if we took Theorem 1.25 as a definition). It suffices (by 
Theorem 1.21) to check that for all X € F1 we have 


((AB)C)X = (A(BC))X. 
But by definition of the product we have 
((AB)C)X = (AB)(CX) = A(B(CX)) 
and 
(A(BC))X = A((BC)X) = A(B(CX)), 
and the result follows. One could also use Theorem 1.25 and check by a rather 


painful computation that the (i, j)-entry in (AB)C equals the (i, j)-entry in 
A(BC), by showing that they are both equal to 


X aixburcs;.- 
il 


All other properties of multiplication stated in the previous proposition are 
proved in exactly the same way and we leave it to the reader to fill in the details. 


Remark 1.29. Because of the associativity property we can simply write ABCD 
instead of the cumbersome ((AB)C)D, which also equals (A(BC))D or 
A(B(CD)). Similarly, we define the product of any number of matrices. When 
these matrices are all equal we use the notation 


A" =AxAx...XA, 


with n factors in the right-hand side. This is the nth power of the matrix A. Note 
that it only make sense to define the powers of a square matrix! By construction we 
have 


A" = A. Att. 
We make the natural convention that A° = J, for any A € M,(F). The reader 
will have no difficulty in checking that J, is a unit for matrix multiplication, in the 


sense that 


A-I,=A and [,-A=A if AE Mma(F). 
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We end this section with a long list of problems which illustrate the concepts 
introduced so far. 


Problem 1.30. Let A(x) € M3(R) be the matrix defined by 


1x x? 
A(x) = | 01 2x 
00 1 


Prove that A(x) A (x2) = A(x, + x2) for all x1, x2 E€ R. 


Solution. Using the product rule given by Theorem 1.25, we obtain 


1x x? 1x2 x2 


A(xı)A(x2) = 0 1 2x1 01 2x2 
00 1 00 1 


lx: +x x3 + 2x1x2 + x? Lx, +x (xX + x2) 
= (0) 1 2x2 + 2x1 = 0 1 2(x4 + x2) 
0 0 1 0 0 1 
By definition, the last matrix is simply A(x, + x2). o 


The result established in the following problem is very useful and constantly used 
in practice: 


Problem 1.31. a) Prove that the product of two diagonal matrices is a diagonal 
matrix. 

b) Prove that the product of two upper-triangular matrices is upper-triangular. 

c) Prove that in both cases the diagonal entries of the product are the product of the 
corresponding diagonal entries. 


Solution. a) Let A = [a;j] and B = [b;;] be two diagonal matrices in M„ (F). Let 
i Æ j € {1,...,n}. Using the product rule, we obtain 


(AB)ij = > Aik Dx; - 
k=1 


We claim that a;kbk; = 0 for all k € {1,2,...,n}, thus (AB);; = O for all 
i Æ j and AB is diagonal. To prove the claim, note that since i # j, we have 
i Æ k or j # k. Thus either aj, = 0 (since A is diagonal) or b; = 0 (since B 
is diagonal), thus in all cases a;ķbk; = 0 and the claim is proved. 

b) Let A = [a;;] and B = [b;;] be upper-triangular matrices in M„ (F). We want to 
prove that (AB);; = 0 for all i > j. By the product rule, 


(AB); = DD Aik DK; 
k=l 
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thus it suffices to prove that for alli > j and all k € {1,2,...,n} we have 
aikbgj = 0. Fixi > j andk € {1,2,...,} and suppose that a;,b,; 4 0, thus 
aik Ż 0 and by; A 0. Since A and B are upper-triangular, we deduce that i < k 
and k < j,thusi < j, a contradiction. 

c) Again, using the product rule we compute 


(AB);; = > Aik Ki - 


k=1 


Assume that A and B are upper-triangular (which includes the case when they 
are both diagonal). If a;,b,; is nonzero for some k € {1,2,...,n}, then į < k 
and k < i, thus k = i. We conclude that 


(AB)ii = diibii 


and the result follows. oO 


Problem 1.32. A matrix A € M,,(R) is called right stochastic if all entries are 
nonnegative real numbers and the sum of the entries in each row equals 1. We 
define the concept of left stochastic matrix similarly by replacing the word row 
with column. Finally, a matrix is called doubly stochastic if it is simultaneously 
left and right stochastic. 


a) Prove that the product of two left stochastic matrices is a left stochastic matrix. 

b) Prove that the product of two right stochastic matrices is a right stochastic matrix. 

c) Prove that the product of two doubly stochastic matrices is a doubly stochastic 
matrix. 


Solution. Note that c) is just the combination of a) and b). The argument for proving 
b) is identical to the one used to prove a), thus we will only prove part a) and 
leave the details for part b) to the reader. Consider thus two left stochastic matrices 
A,B € M,(R), say A = [aj] and B = [b;;]. Thus aj; > 0, bj > 0 for all 
i,j € {1,2,...,m} and moreover the sum of the entries in each column of A or B 
is 1, which can be written as 


X ani =) So byi =u 
k=1 k=1 


fori € {1,2,...,}. Note that by the product rule 
(AB); = Yo aby 
i=1 


is nonnegative for alli, j € {1,2,...,”}. Moreover, the sum of the entries of AB 
in the ith column is 
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n 


XO (AB) = > So aj bji — Xo (>: ah] 
k=l k=1 (j=! k=1 


= Doar $ (£e) = X bji -1 = X bj — I; 
j=l k=1 j=l j=l 


where we used once the fact that A is left stochastic (so that }°;_, ax; = 1 for all 
J) and once the fact that B is stochastic (hence i bj; = 1 for all i). The result 
follows. oO 


Problem 1.33. Let (£j;)1<j,;<n be the canonical basis of M,„(F). Prove that if 
i, j,k,l € {1,2,...,n}, then 
Eij Eki = jx Eit, 


where ô; equals 1 if j = k and 0 otherwise. 


Solution. We use the product rule: let u,v € {1,2,..., n}, then 


(Ejj Exi)uv = SEE 


w=1 


Now (Eab)ca is zero unless a = c and b = d, and it is equal to 1 if the previous 
two equalities are satisfied. Thus (E;j)uw(Ekı)wv is zero unless i = u, j = w and 
k = w, l = v. The last equalities can never happen if j # k, soif j # k, then 
(Ei; Eki)w = O for all u,v € {1,2,...,}. We conclude that E;; Exi = 0 when 
JH. 

Assuming now that j = k, the previous discussion yields (Ej; Eki) = 1 if 
u = i and v = l, and it equals 0 otherwise. In other words, 


(Ei; Extu = (Ext) uv 


for all u,v € {1,2,...,}. Thus £;; Ex; = Ej; in this case, as desired. oO 


Problem 1.34. Let (£j;)1<i,j<n be the canonical basis of M,(F). Let i,j € 
{1,2,...,} and consider a matrix A = [a;;] € M, (F). 


a) Prove that 


00...an; 0...0 


the only possibly nonzero entries being in the jth column. 
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b) Prove that 


Aji Aj2... Ajn 
E;; A = J J J 
0 0... 0 
0 0... 0 
the only possibly nonzero entries being in the ith row. 


Solution. a) Write 


n 
A= > akı Ext. 
k,=1 


Using Problem 1.33, we can write 


n n 
AE}; = J an Ens Eiy = J anbi Ej 
k I=1 k I=1 


n 
= X aki Ex,j = aj E1 j + azi E2, j +... + ani En j. 
k=l 


Coming back to the definition of the matrices E1;,..., £,,;, the result follows. 
b) The proof is identical and left to the reader. 


oO 


Problem 1.35. Prove that a matrix A € M,(F) commutes with all matrices in 
M,,(F) if and only if A = cI, for some scalar c € F. 


Solution. If A = cI, for some scalar c € F, then AB = cB and BA = cB for all 
B € M,(F), hence AB = BA for all matrices B € M,,(F). Conversely, suppose 
that A commutes with all matrices B € M, (F). Then A commutes with £;; for all 
i,j €{1,2,...,}. Using Problem 1.34 we obtain the equality 


0 0... 0 
00...a1;0...0 
00...a;0...0 Oba tie eval 
TTE 0 0...0 
00... ani 0...0 


24 1 Matrix Algebra 


If i # j, considering the (j, 7)-entry in both matrices appearing in the previous 
equality yields a;; = 0, thus a;; = 0 fori # j and A is diagonal. Contemplating 
again the previous equality yields a;; = aj; for alli, j and so all diagonal entries 
of A are equal. We conclude that A = a, Z„ and the problem is solved. oO 


Problem 1.36. Find all matrices A € M3(C) which commute with the matrix 
100 


A=|020 
003 


Solution. Let B = [b;;] be a matrix commuting with A. Using the product rule, we 
obtain 


100 bii Diz bi3 by by bi 
AB= 020]: b>, br b23 = 2b21 2bz2 2b23 
003 b31 b32 b33 3b31 3b32 3b33 
and 
bii biz biz 100 bıı 2b12 3b13 
BA = b21 br bx $ 0 2 0 = b21 2br 3b23 
b31 b32 b33 003 b31 2b32 3b33 


Comparing the equality AB = BA yields 


biz = biz = bn = bz = bzi = bz = 0 


and conversely if these equalities are satisfied, then AB = BA. We conclude that 


bi, 0 0 

the solutions of the problem are the matrices of the form B = | 0 bo. O |, that 
0 0 bz 

is the diagonal matrices. o 


Problem 1.37. A 3 x 3 matrix A € M3(R) is called circulant if there are real 
numbers a, b, c such that 


abc 
A=|cab 
bca 


a) Prove that the sum and product of two circulant matrices is a circulant matrix. 
b) Prove that any two circulant matrices commute. 
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abc xyz 
Solution. Let A= | cab | and B = | z x y | be two circulant matrices. 
bca yzx 


a) Note that 


a+xb+yc+z 
A+B=|c+za+xb+y 
b+yct+za+x 


is a circulant matrix. Using the product rule we compute 


uvw 
AB=)wuvl, 
vwu 


where 
u=ax+bz+cy, v=ay+bx+cz, w=az+by+cx. 


Thus AB is also a circulant matrix. 
b) Similarly, using the product rule we check that 


u vw 


BA=|wuv |= AB. 


vwu 


Problem 1.38. If A, B € M, (C) are matrices satisfying 
A? = B? = (ABP = l, 


prove that A and B commute. 


Solution. Multiplying the relation ABAB = I„ by A on the left and by B on the 
right, we obtain 


A? BAB? = AB. 


By assumption, the left-hand side equals 7, BAI, = BA, thus BA = AB. oO 
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1.3.1 Problems for Practice 


1. Consider the matrices 


A=[123] and B=|5 
6 


Which of the products AB and BA make sense? For each product which makes 
sense, compute the entries of the product. 
2. Consider the matrices 


111 011 
A=]ļ|011l and B=|110 
001 001 


in M3(F2). Compute AB and BA. 
3. Consider the matrices 


102 0 1 
A= -10 |, B=j]2-1 
111 10 


Which of the products A, AB, BA, B? makes sense? Compute all products that 


make sense. 
4. Let A = 13 
; oa |e 


a) Find all matrices B € M2(C) which commute with A. 
b) Find all matrices B € M>(C) for which AB + BA is the zero matrix. 


5. Determine all matrices A E€ M2 (R) commuting with the matrix 
12 
34| 


1x 
. 1 . = 
6. Let G be the set of matrices of the form Ria f 1 | with x € (—1, 1). Prove 


that the product of two elements of G is an element of G. 


7. (matrix representation of C) Let G be the set of matrices of the form 5 a] 
a 
witha,b ER. 


a) Prove that the sum and product of two elements of G is in G. 
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b) Consider the map f : G > C defined by 


[sa |) ser 


Prove that f is a bijective map satisfying f(A + B) = f(A) + f(B) and 
F(AB) = f(A) f(B) for all A,B € G. 


c) Use this to compute the nth power of the matrix k a | 
a 


8. For any real number x let 


1-x0O0 x 
A(x) = 0 1 0 
x Ol-x 


a) Prove that for all real numbers a, b we have 
A(a)A(b) = A(a + b —2ab). 


b) Given a real number x, compute A(x)”. 
9. Compute A’, where 
100 


A=|020 
003 


10. a) Give a detailed proof, by induction on k, for the binomial formula: if 
A, B € M,(F) commute then 


_ (') bi pd 
(A+ Bk =5° j A*I Bİ. 


J=0 


b) Give a counterexample to the binomial formula if we drop the hypothesis 
that A and B commute. 
11. a) Let 


00 1 
B= | 00-1 
11 0 


Prove that B? = 03. 
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12. 


13. 


14. 


15. 


16. 


17. 
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b) Let a be a real number. Using part a) and the binomial formula, compute A” 
where 


10a 
A=]|01-a 
aa 1 


Let 


101 
= 1 
A=|-11} 

001 


a) Prove that (A — B)? = 03. 
b) Compute A” for all positive integers n, by using part a) and the binomial 
formula. 


a) Prove that the matrix 


110 
A=|011 
001 


satisfies (A — I3)? = O3. 
b) Compute A” for all positive integers n. 
a) Prove that the matrix 


2 34 
A=] 4 20 
—3 02 


satisfies (A — 25)? = 03. 
b) Compute A” for all positive integers n. 
Suppose that A € M,,(C) is a diagonal matrix whose diagonal entries are 
pairwise distinct. Let B € M,,(C) be a matrix such that AB = BA. Prove 
that B is diagonal. 
A matrix A € M,,(R) is called a permutation matrix if each row and column 
of A has an entry equal to 1 and all other entries equal to 0. Prove that the 
product of two permutation matrices is a permutation matrix. 
Consider a permutation o of 1,2,...,n, that is a bijective map 


o:{1,2,...,m}— {1,2,...,n}. 
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We define the associated permutation matrix P, as follows: the (i, j )-entry of 
P, is equal to 1 if i = o(/) and 0 otherwise. 


a) Prove that any permutation matrix is of the form P, for a unique permutation 
oO. 

b) Deduce that there are n! permutation matrices. 

c) Prove that 


Pa © Pos = Poo 


for all permutations 01, 02. 
d) Given a matrix B € M,(F), describe the matrices P, B and BP, in terms 
of B and of the permutation o. 


1.4 Block Matrices 


A sub-matrix of a matrix A E€ Mm n(F) is a matrix obtained from A by deleting 
rows and/or columns of A (note that A itself is a sub-matrix of A). A matrix can be 
partitioned into sub-matrices by drawing horizontal or vertical lines between some 
of its rows or columns. We call such a matrix a block (or partitioned) matrix and 
we call the corresponding sub-matrices blocks. 

Here are a few examples of partitioned matrices: 


1123 0100 
O12), hpl. 1111 
0/0 1 1 1213 


We can see a partitioned matrix as a “matrix of matrices”: the typical shape of a 
partitioned matrix A of size m x n is 


Ai Aj eee Aik 
Ar An eee Ang 
An Ajo... AlK 
where Aj; is a matrix of size m; x n; for some positive integers mı, ..., mı and 


ni,..., ng with mi + m2 +... + m; =mandnyi+nt+...tn =n. Ifl =k, 
we call the blocks A11, ..., Akk the diagonal blocks and we say that A is block 
diagonal if all blocks of A but the diagonal ones are zero. Thus a block diagonal 
matrix is of the form 
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Ai, 0 0 
0 Ay... 0 
0 0 ... Arg 


An important advantage is given by the following rules for addition and 
multiplication of block matrices (which follow directly from the rules of addition 
and multiplication by matrices; we warn however the reader that the proof of the 
multiplication rule is quite involved from a notational point of view!): 


e If 


Ai Ap VAGY Aik By By arr Bix 

Ay Ago... Ark Bo Baz... Box 
Se Ne 2 ero os and B=] . . . 

An An ... Alk By Bi... Bix 


with A;; and B;; of the same size for all i, j (so the rows and columns of B and 
A are partitioned in the same way), then 


Ai + By Ain + Bin... Aik + Bix 
Ax, + Bo An + Boo... Arg + Box 


A+B= 
An + By An + Bn c Aik + Bix 
e If 
Aj, Aiz... Atk By By... By, 
= Az, Ao... Ák = By By ... Ba 
re An . ve Bu Bus a Bu 


are m x n, respectively n x p partitioned matrices, with A;; of size m; x n; and 
Bij of size n; x pj, then 


Ci Cin... Cir 
Cy, Cr ... Coy 
AB=) . . . 


Cy Cin... Cir 
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where 


k 
Cij = 5 AiuBuj. 


u=1 


1.4.1 Problems for Practice 


If A = [aj] € Mnjn,(F) and B € M,,,,(F) are matrices, the Kronecker 
product or tensor product of A and B is the matrix A ® B € Myymynin(F) 
defined by 


aB an B r ain B 
azn B an B ar a2, B 
A®B= 
Am, 1B Am,2B ... Amn, B 
1. Compute the Kronecker product of the matrices 
010 
A=]100 and s=[1 Al 
001 
2. Do we always have A ® B = B & A? 
3. Check that Im ® In = Imn. 
4. Prove that if Ay € Mmiıni (F), A2 € Mnn (F), Bi E€ Minn, (F) and By € 
Mnr (F), then 
(A; 8 Bı): (42 ® B2) = (A142) ® (Bı B2). 
5. Prove that if A € Mm(F) and B € M, (F) then 


A8 B= (48 In): (In 8 B). 


1.5 Invertible Matrices 


Let n be a positive integer. We say that a matrix A € M,,(F) is invertible or non- 
singular if there is a matrix B € M, (F) such that 


AB = BA = L,. 
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Such a matrix B is then necessarily unique, for if C is another matrix with the same 
properties, we obtain 


C =1,-C =(BA)C = B(AC) = BI, = B. 


The matrix B is called the inverse of A and denoted A7!. 
Let us establish a few basic properties of invertible matrices: 


Proposition 1.39. a) Ifc is a nonzero scalar, then cI, is invertible. 
b) If A is invertible, then so is A~', and (A~!)“! = A. 
c) If A, B € M, (F) are invertible, then so is AB and 


(AB)! = BA, 


Proof. a) The matrix c7! I, is an inverse of the matrix c/,. 

b) Let B = A7!, then BA = AB = I, showing that B is invertible, with inverse A. 

c) By assumption A~! and B™! exist, so the matrix C = B~!A™! makes sense. We 
compute 


(AB)C = ABB 'A'= AI,A |= AA! =], 
and similarly 

C(AB) = B!'A!AB = B'I,B = BB = l,, 
showing that AB is invertible, with inverse C. 


oO 


Remark 1.40. a) One should be careful when computing inverses of products of 
matrices, for the formula (AB)~! = A`! B~! is not correct, unless A and B 
commute. We will have 


(ABC)! = C7! B'AT! 


and not A~'B~'C7! in general. Thus the inverse of the product equals the 
product of the inverses in the reverse order. 

b) By the proposition, invertible matrices are stable under product, but they are 
definitely not stable under addition: the matrices J, and —J,, are invertible, but 
their sum O, is not invertible (as O, A = O, 4 I, for any matrix A € M,(F)). 


The set of invertible matrices plays an extremely important role in linear algebra, 
so it deserves a definition and a special notation: 


Definition 1.41. The set of invertible matrices A € M,,(F) is called the general 
linear group and denoted GL, (F). 
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Unfortunately, with the tools introduced so far it is illusory to hope to understand 
the fine properties of the general linear group GL, (F). Once we develop the theory 
of linear algebra from the point of view of vector spaces and linear transformations, 
we will have a much more powerful theory that will make it easier to understand 
invertible matrices. Just to give a hint of the difficulty of the theory, try to prove by 
bare hands that if A, B € M,(F) satisfy AB = In, then A is invertible. The key 
point is proving that this equality forces BA = I, but this is definitely not trivial 
simply by coming back to the multiplication of matrices! In subsequent chapters we 
will develop a theory of determinants which allows a much cleaner characterization 
of invertible matrices. Also, in subsequent chapters we will describe an algorithm, 
based on operations on the rows of a matrix, which gives an efficient way of solving 
the following problem: given a square matrix A, decide whether A is invertible and 
compute its inverse if A is invertible. This problem is not easy to solve with the tools 
we have introduced so far. 


010 

Problem 1.42. Consider the matrix A = | 100 |. Is the matrix A invertible? If 
001 

this is case, compute At}, 


Solution. Since we don’t have any strong tools at our disposal for the moment, let 
abe 
us use brute force and look for a matrix | x y z | such that 


uvw 


010 abc 
100]-|xyz]=. 
001 uvw 


xyz 
The left-hand side equals | a b c |, so this last matrix should be equal to /3. This 
uvw 
gives a unique solution x = b = w = 1 and all other variables are equal to 0. We 
conclude that A is invertible and 


010 
A'=1100 
001 


oO 


It is clear that the method used to find the inverse of A in the previous problem is 
not efficient and quickly becomes very painful even for 3 x 3 matrices. We will see 
later on a much more powerful approach, but we would like to present yet another 
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method, which can be fairly useful in some situations (especially when the matrix 
has some nice symmetry properties or if it has many zeros). 

Consider a matrix A € M„(F) and a vector b € F”. Assume that we can always 
solve the system AX = b with X € F” and that it has a unique solution. Then one 
can prove (we will see this in a later chapter, so we will take it for granted in this 
chapter) that A is invertible and so the solution of the system is given by X = A~!b 
(multiply the relation AX = b by A7!). On the other hand, assume that we are 
able to solve the system by hand, then we have a description of X in terms of the 
coordinates of b. Thus we will know explicitly A~'b for all vectors b € F” and this 
is enough to find A~!. In practice, the resolution of the system will show that 


c11b1 + Ci2b2 +... + Cinbn 


í C21b1 + C22b2 +... + Conbn 
A~ b = ; 


Cn by + c22b2 +... + Canby 


for some scalars c;;, independent of b,,...,b,. Letting b be the ith vector of the 
canonical basis of F”, the left-hand side A~'b is simply the ith column of A7!, 
while the right-hand side is the ith column of the matrix [c;;]. Since the two sides 
are equal and this for all 7, we conclude that 


A! = [ci;]. 


Note that once the system is solved, it is very easy to write the matrix A~! directly 
by looking at the expression of A~'b. Namely, if the first coordinate of A~'b is 
c11D1 + Ci2b2 +... + Cindy, then the first row of AT! is (c11, C12, ..., Cin). Of 
course, the key part of the argument is the resolution of the linear system AX = b, 
and this will be discussed in a subsequent chapter. We will limit therefore ourselves 
in this chapter to rather simple systems, which can be solved by hand without any 
further theory. 
Let us see a few concrete examples: 


Problem 1.43. Compute the inverse of the matrix A in the previous problem using 
the method we have just described. 


bi 
Solution. Given a vector b = | by | € F?, we try to solve the system AX = b, 
bs 
x] 
with X = | x2 |. The system can be written as 
X3 


x2: =b, xX =b, x= b, 
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or equivalently 
xı =b, xX. =b, x3 = b3. 


Since this system has a solution for all b € F?, we deduce that A is invertible and 
that for all b € F? we have 


x) by 
A'b=X=|x |=] d 
X3 bs 


The first coordinate of A~'D is by, thus the first row of A~! is (0, 1,0), the second 
coordinate of A~!b is bı so the second row of AW! is (1, 0,0). Finally, the third 
coordinate of A~!b is b3, so the third row of A7! is (0, 0, 1). We conclude that 


010 
A'=]|]100 
001 


Problem 1.44. Consider the matrix 


1111 
0111 
0011 
0001 


Prove that A is invertible and compute A7 !. 


bı 
Solution. Again, given a vector b = K e F* we try to solve the system 
b4 
X1 
AX = b with X = i . The system can be written 
x4 


xı + x2 + x3 + x4 = bı 
X2 + X3 + X4 = by 
X3 + X4 = b3 
X4 = b4 
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and can be solved rather easily: the last equation gives x, = b4. Subtracting the last 
equation from the third one yields x3 = b3 — ba, then subtracting the third equation 
from the second one yields x. = bz — b; and finally x; = bı — b2. Thus the system 
always has solutions and so A is invertible, with 


X1 bi — b2 
b2 —b 
A pb=xX- XD 2 3 
X3 b — b4 
X4 b4 


The first coordinate of A~'b being bı — b2, we deduce that the first row of A is 
[1 —1 0 0]. Similarly, the coordinate b2 — b3 gives the second row of A namely 


[0 1-1 OF and so on. We end up with 
-1 0 
AOS 


—1 


1 
0 1 
0 0 1 


Problem 1.45. Letn be a positive integer. Find the inverse of the matrix 


V2 Sica N 
012...n-1 
001...n—2 
000... 1 


Solution. Let A be the matrix whose inverse we are trying to compute. Given a 
vector b € R” with coordinates b1, b2,..., bn, let us try to solve the system AX = 
b. This system is equivalent to 


Xy + 2x2 + 3x3 +... + AXn = bi 
x2 + 2x3 +... + (n — 1)Xn = b2 


Xn—1 + 2X, = bn—ı 
Xn = db, 


In principle one can easily solve it by starting with the last equation and 
successively determining Xn, Xn—1,..-,Xı from the equations of the system. To 
make our life simpler, we subtract the second equation from the first, the third 
equation from the second,..., the nth equation from the n — 1th equation. We end 
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up with the equivalent system 
xi Hx + x3 +... + Xn = bi — b2 
X2+X3 Pase F Xa = b, — b; 
Xn-1 + Xn = bn—1 F bn 
Xp — N P 


Subtracting again consecutive equations yields 


xı = bı — 2b + b3, x2 = ba — 2b3 + b4, ...,Xn—-1 = bn—1 — 2bn, Xn = bn. 


Since the system AX = b always has solutions, we deduce that A is invert- 
ible. Moreover, the system is equivalent to A~'b = X and the expressions of 
X1,X2,...,X, give us the rows of A7!: x; = b, — 2b, + bz shows that the first 


row of A~! equals (1,—2,1,0,...,0),..., X»—-1 = bn—1 — 2b, shows that the next- 
to-last row is (0,0,...,0,1,—2) and finally the last row of AT! is (0,0,..., 1). 
Thus 


1-2 1 0...0 

0O 1 —2 1...0 
A`! = yr 5 Se us 
00 0 0...1- 
00 0 0 


oO 


Problem 1.46. Let A, B € M,,(F) be matrices such that AB = BA. Prove that if 
A is invertible, then A~!B = BA™!. 


Solution. Multiply the relation AB = BA on the left by AT! and on the right by 
A7!” We obtain 


A'ABA7! = A7'BAA™!. 


Since A~!A = I, the left-hand side equals BA™!. Since AA7! = I,, the right- 
hand side equals A~! B. Thus A~!B = BA™!, as desired. Oo 


Problem 1.47. Prove that a diagonal matrix A € M,(F) is invertible if and only 
if all its diagonal entries are nonzero. Moreover, if this is the case, then A`! is also 
diagonal. 
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Solution. Let A = [a;;] € M,(F) be a diagonal matrix. If B = [bij] € M, (F) is 
an arbitrary matrix, let us compute AB. Using the product rule, we have 


(AB); = } aixbey. 
k=] 
We have aix = 0 fork 4 i, since A is diagonal. Thus 
(AB); = aiibij 
and similarly 


(BA);; = ajjb 


ij- 


Suppose now that a;; # O for alli € {1,2,...,n} and consider the diagonal 
matrix B with diagonal entries bj; = +. Then the formulae in the previous 


paragraph yield AB = BA = J, thus A is invertible and A~! = B is diagonal. 
Conversely, suppose that A is invertible and diagonal, thus we can find a matrix 
B such that AB = BA = I. Thus for all i € {1,...,n} we have 
1 = (I)i = (AB)ii = diibii, 


hence a;; # 0 for alli and so all diagonal entries of A are nonzero. o 


Sometimes, it can be very easy to prove that a matrix A is invertible and to 
compute its inverse, if we know that A satisfies some algebraic equation. For 
instance, imagine that we knew that 

Æ +3A+I = Op. 
Then A? + 3A = —1,, which can be written as 

A+ (-A? —3I,) = In. 
On the other hand, we also have 


(-A? —31,)-A=—-A?-3A= h, 


thus A is invertible and A~! = —A? — 3/,. In general, a similar argument shows 
that if A e M,,(C) satisfies an equation of the form 


aq A? +ag AT! +... + aol, =0 
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for some complex numbers do,...,@g with dy # 0, then A is invertible and 
a ad— a 
At=— (Sta asa Eee 1.) 
ao ao ao 


Of course, it is totally mysterious how to find such an algebraic equation satisfied 
by A in general, but we will see much later how to naturally create such equations 
(this will already require a lot of nontrivial theory!). 

We discuss below two more such examples. 


Problem 1.48. Consider the matrix 
12 1 
A=1]21 3 
30-1 
a) Check that 
A? — A? — 8A — 1813 = O3. 


b) Deduce that A is invertible and compute A~!. 


Solution. a) We compute brutally, using the product rule 


12 1 12 1 8 46 
A? =|213 |-]21 3 |=] 1352 
30-1 30-1 0 64 
and 
12 1 8 46 34 20 14 
A=] 213 |-| 1352 | =| 2931 26 
30-1 0 64 24 6 14 
It follows that 
8 16 8 
A — A? -— 18h = | 16 8 24 | =8A 
24 0 -8 


and the result follows. 
b) We can write the identity in a) as follows: 


A(A? — A— 813) = 181; 
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or equivalently 
ee 
A-—(A*-A-8]3) = h. 
Similarly, we obtain 
l (4 -A-8h)-A=I 
i8 3 = 13 


and this shows that A is invertible and 


Pa S ,f-12 5 
A = (4 —A-8h) = | 4-1 
=3 6 =3 


Problem 1.49. Letn > 1 be an integer and let 


2ix 20 ane 2m 
€=e" = cos — + isin —. 
n n 


Let F, be the Fourier matrix of order n, whose (j, k)-entry is 0706D for 1 < 
jk <n. 


a) Let F, be the matrix whose (j, k)-entry is the complex conjugate of the (j, k)- 
entry of F,,. Prove that 


F,- Fa = Fa Fp = nh. 


b) Deduce that F, is invertible and compute its inverse. 


Solution. a) Let j,k € {1,2,...,m} and let us use the product rule to compute 


(Fn - Fad je = (Fa) jt Fae = 
1=1 


n n 
5 Eu-De) -CU-DE-D) = CO 


i=1 l=1 
the last equality being a consequence of the fact that ¢ = ¢7!. Thus 
n n-1 


(Fa Fi) jk = ye = PE Gan 


l=1 l=0 
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The last sum is the sum of a geometric progression with ratio €/-*. If j = k, 


then ¢/—* = 1, so the sum equals n, since each term is equal to 1 in this case. If 
j Æ k, then ¢/-* Æ 1 and we have 


iene baer Ey 
de See eee 


the last equality being a consequence of the formula ¢” = 1. We conclude that 
(Fa + Fn) jk equals n when j = k and equals 0 otherwise. It follows that 


Fy? Fa =nh,. 
The equality F, - F, = nI, is proved is exactly the same way and we leave the 


details to the reader. 
b) By part a) we can write 


1.5.1 Problems for Practice 


1. Find the inverse of the matrix 


shd 


2. For which real numbers x is the matrix 


a= [i3] 


invertible? For each such x, compute the inverse of A. 
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. Is the matrix 


101 
A=]011] € MŒ) 
110 


invertible? If so, compute its inverse. 


. Same problem with the matrix 


101 
A=]|011 | € MF). 
010 


. Consider the matrix 


12345 
01234 
A=|00123 | € M;(R). 
00012 
00001 


Prove that A is invertible and compute its inverse. 


. Consider the matrix 


0111 
1011 
1101 
1110 


1 
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a) Compute the inverse of A by solving, for each b € R4, the system AX = b. 
b) Prove that A? = 314 + 2A. Deduce a new way of computing A`!. 


. Let A be the matrix 


3. =] 2 
A=]| 5 -—2 3 
—1 0 -1l 


a) Check that A? = O3. 
b) Compute (/3 + A)~!. 


1.5 


8. 


10. 


11. 
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Let n be a positive integer. Find the inverse of the n x n matrix A whose entries 
on or above the main diagonal are equal to 1 and whose entries (strictly) below 
the main diagonal are zero. 


. Consider the matrices 


0100 0010 
-100 0 0001 
ane oo0-1|° 3= -1 0 00 
0010 0 -100 
and 
0001 
0 0-10 
c= 0100 
-10 00 


and let H be the set of all matrices of the form aA + bB + cC + diy, with 
a,b,c,d real numbers. 


a) Prove that A? = B? = C? = —Jy and 
BC =-CB=4A, CA=-AC=B, AB=-—BA=C. 


b) Prove that the sum and product of two elements of H is an element of H. 
c) Prove that all nonzero elements of H are invertible. 


Let A, B € M,,(R) such that 
A+B=I1, and A?+B’?=0O,. 
Prove that A and B are invertible and that 
(AT! + By" = 2", 


for all positive integers n. 
Let A € M,,(R) be an invertible matrix such that 


A™! = l, — A. 
Prove that 


A® — I, = On. 
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12. Let A € M,,(R) be a matrix such that A? = uA, where p is a real number with 
u # —1. Prove that 


1 

-1 

Un + A) =h nad 

13. Recall that a permutation matrix is a matrix A € M,,(C) such that every row 
and column of A contains one entry equal to 1 and all other entries are 0. Prove 
that a permutation matrix is invertible and that its inverse is also a permutation 
matrix. 

14. Suppose that an upper-triangular matrix A € M,,(C) is invertible. Prove that 
A`! is also upper-triangular. 

15. Let a,b,c be positive real numbers, not all of them equal and consider the 
matrix 


a0b0c0 
Oa0cO0b 
c0a0b0 
Ob0a0c 
b0c0a0 
O0cO0b0a 


Prove that A is invertible. Hint: AT! is a matrix of the same form as A (with 
a,b,c replaced by suitable real numbers x, y, z). 


1.6 The Transpose of a Matrix 


Let A € Mm n(F) be am xn matrix. The transpose of A is the matrix 'A (also 
denoted as A’) obtained by interchanging the rows and columns of A. Consequently 
‘Aisan xm matrix, i.e., 'A € Mn m(F). It is clear that ‘J, = I„. Note that if 
A = [a;;], then’ A = [a;;], that is 


CA); = Aji (1.3) 
12> 3 

Example 1.50. a) The transpose of the matrix | 0—1 —2 | is the matrix 
34 5 
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10 

. | 1234). een Peel 

b) The transpose of the matrix | 012 | is the matrix 32 
43 


The following proposition summarizes the basic properties of the operation A —> 
‘A on Minn (F). 
Proposition 1.51. The transpose operation has the following properties: 
1) CA) = A forall A € Mmn (F). 
2) “(A+ B)='A+'B forall A, B € Mm, (F); 
3) '(cA) = cA ifc € F is a scalar and A € Mm n (F). 
4) '(AB)='B'A if A € Mmn(F)and B € M, (F); 
5) '(A*) = (1A) if A € M, (F) and k is a positive integer; 
6) Ifthe matrix A is invertible, then ' A is also invertible and 


CAT = (47); 


Proof. Properties 1), 2), and 3) are immediate from the definition of the transpose 
of a matrix (more precisely from relation (1.3)). Let us prove (4). First, note that 
'B € M,,(F) and'A € Mn m(F), thus ‘B-’ A makes sense. Next, if A = [a;;] and 
B = [b;x], we have 


('(AB))ki = (AB)ix = X aijbjk = X_C B)u CA) ji = CB ADEs. 
j=l j=l 
thus ‘(AB) = ‘B'A. 
Property 5) follows by induction on k, using property 4. Finally, property 6) also 
follows from 4), since 


i, = 'In = (A-A) = “(AISA 


and similarly 'A-‘(A7!) = In. o 


It follows from the previous proposition that the transpose operation leaves the 
general linear group GL,(F) invariant, that is ‘A € GL,(F) whenever A € 
GL, (F). 

Problem 1.52. Let X € F” be a vector with coordinates x,,...,x,, considered as 


a matrix in M,, (F). Prove that for any matrix A € M,(F) we have 


'X('A . A)X = X aax + dj2X2 +... + igh) 


i=l 
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Solution. First of all, we use Proposition 1.51 to obtain 


ty(tA- A)X = 'X'AAX = '(AX)- AX. 


Write now 
ai1X1 +... + aAinXn yı 
an X1 +... + AnXn y2 
ani Xi +... + AnnXn Yn 
Then 
yı 


Yn 
and using the product rule for matrices, we obtain that the last quantity equals y? + 
... + y2. We conclude that 


‘'X(('A-A)X ='¥-Yay?t...ty= X aax + aixa +... + AinXn). 


i=1 
E 


There are three types of special matrices that play a fundamental role in linear 
algebra and that are related to the transpose operation: 


¢ The symmetric matrices. These are matrices A € M,(F) for which ‘A = A 
or equivalently a;; = aj; for all i, j. They play a crucial role in the theory 
of quadratic forms and euclidean spaces (for the latter one choose F = R), 
and a whole chapter will be devoted to their subtle properties. For example, all 
symmetric matrices of order 2 and 3 are of the form 


b abc 
i; IF a,b,c € F and bde|, a,b,c,d,e, f EF. 
be 
cef 


¢ The orthogonal matrices. These are matrices A € GL, (F) for which 


A7! ='A. 
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They also play a fundamental role in the theory of euclidean spaces, since these 
matrices correspond to isometries of such spaces. They will also be extensively 
studied in the last chapter of the book. 

° The skew-symmetric (or antisymmetric) matrices. These are matrices for which 


A+ "A = On, 
that is ‘A = —A. These matrices are related to alternating forms. They satisfy 
aij = —aj; for alli, j. Thus 2a;; = 0. If F € {Q, R, C}, then this last equality 
forces a;; = 0 for all i. Thus the diagonal elements of a skew-symmetric matrix 
are in this case equal to 0. On the other hand, over a field F such as F, (the 
field with two elements), the condition 2a;; = 0 does not give any information 


about the element a;;, since for any x € F, we have 2x = 0. Actually, over such a 
field there is no difference between symmetric and skew-symmetric matrices! All 
skew-symmetric matrices of order 2 and 3 over the field C of complex numbers 
are of the following form: 


0 0 ab 
| aE aeéC and —a 0 c|, a,b,c EC. 
—a 0 
—b —c 0 


Proposition 1.53. All matrices in the following statements are square matrices of 
the same size. Prove that 


1) The sum of a matrix and its transpose is a symmetric matrix. The difference of a 
matrix and its transpose is a skew-symmetric matrix. 

2) The product of a matrix and its transpose is a symmetric matrix. 

3) Any power of a symmetric matrix is symmetric. 

4) An even power of a skew-symmetric matrix is symmetric. An odd power of a 
skew-symmetric matrix is skew-symmetric. 

5) If A is invertible and symmetric, then A™! is symmetric. 

6) If A is invertible and skew-symmetric, then A~! is skew-symmetric. 


Proof. 1) If A is a matrix, then ‘(A + A) = 'A # (A) = 'A+ A = A # 4A, 
thus A +‘ A is symmetric. Similarly, ‘(A —* A) = 'A— A = —(A — A), thus 
A — A is skew-symmetric. 

2) We have ‘(A‘A) = ‘('A)'A = A'A, thus A‘ A is symmetric. 

3) and 4) follow from the equality € A)” = ‘(A"), valid for any matrix A and any 
n>l. 

5) and 6) follow from the equality (47!) = (‘A)7!, valid for any invertible 
matrix A. o 


We end this section with a rather long list of problems that illustrate the ideas 
introduced in this section. 
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Problem 1.54. Describe the symmetric matrices A € M,,(F) which are simultane- 
ously upper-triangular. 


Solution. Let A = [a;;] be a symmetric and upper-triangular matrix. By definition 
aij = 0 whenever i > j (since A is upper-triangular) and moreover aj; = a j; for all 
i,j €{1,2,...,n}. We conclude that a;; = 0 whenever i Æ /, that is A is diagonal. 
Conversely, any diagonal matrix is clearly symmetric and upper-triangular. Thus the 
answer of the problem is: the diagonal matrices. o 


Problem 1.55. How many symmetric matrices are there in M, (F2)? 


Solution. By definition, each entry of a matrix A = [a;;] € M, (F2) is equal to 0 
or 1, and A is symmetric if and only if a;; = aj; for alli, j € {1,2,...,n}. Thus a 
symmetric matrix A is entirely determined by the choice of the entries above or on 
the main diagonal, that is the entries a;; with 1 < i < j < n. Moreover, for any 
choice of these entries, we can construct a symmetric matrix by defining a;; = aj; 
fori > j. For each pair (i, 7) with | <i < j < n we have two choices for the entry 
aj; (either 0 or 1). Since there are n + (5) = uae such pairs (i, j) (n pairs with 
i = j and (5) a pairs with i < j) and since the choices are independent, 


2 
: : : 2 a 20D 
we deduce that the number of symmetric matrices in M, (F2) is2 2. o 


Problem 1.56. a) Describe the diagonal matrices A € M,,(R) which are skew- 
symmetric. 
b) Same question, but replacing R with F2. 


Solution. a) Let A = [a;;] € M, (R) be a diagonal skew-symmetric matrix. Since 
A is diagonal, all entries away from the main diagonal are zero. Also, since A + 
t A = 0, we have 


aj; +a;; = 0 


for alli € {1,2,...,n}, by noticing that A and 'A have the same diagonal 
entries. We conclude that 2a;; = 0 and so a;; = 0 for alli € {1,2,..., n}. Thus 
A = O, is the unique diagonal skew-symmetric matrix in M, (R). 

b) As in part a), a matrix A = [a;;] € M,(F2) is diagonal and skew-symmetric 
if and only if it is diagonal and its diagonal entries a;; (for 1 < i < n) satisfy 


2a;; = 0. However, any element x of F, satisfies 2x = 0, thus the condition 
2aii = 0 is automatic. We conclude that any diagonal matrix A € M,(F2) is 
skew-symmetric! o 


Problem 1.57. A matrix A € M,,(R) has a unique nonzero entry in each row and 
column, and that entry equals 1 or —1. Prove that A is orthogonal. 


Solution. Let A = [a;;]. We need to prove that A™! = ‘A, that is A- ‘A = J, and 
tA- A= l,. Fixi, j € {1,2,...,}. Then the (i, j )-entry of A- ‘A is 


(A: 'A)j = X dik jx 
k=l 
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Assume that a;ķa ję is nonzero for some k € {1,2,...,n}, thus both a; and a jx 
are nonzero. If i Æ j, this means that A has at least two nonzero entries in column 
k, which is impossible. Thus if i # j, then a;,a;, = 0 for all k € {1,2,...,m} and 
consequently the (i, j )-entry of A- ‘A is 0. 

On the other hand, if i = j, then 


(A: 'A)j = Xah 


k=1 


Now, by assumption the ith row of A consists of one entry equal to 1 or —1, and 
all other entries are 0. Since }°7_, a, is simply the sum of squares of the entries 
in the ith row, we deduce that X`}; a7, = 1 and so (A - ‘A);; = 1 fori = j. We 
conclude that A- ‘A = I„. The reader will have no problem adapting this argument 
in order to prove the equality ‘A- A = Ip. o 


Remark 1.58. In particular all such matrices are invertible, a fact which is definitely 
not obvious. Moreover, it is very easy to compute the inverse of such a matrix: 
simply take its transpose! 


Problem 1.59. Prove that any matrix A € M,C) can be expressed in a unique way 
as B + C, where B is symmetric and C is skew-symmetric. 


Solution. If A = B + C with B symmetric and C skew-symmetric, then 
necessarily ‘A = B — C, thus 


1 1 
B= 5(A+'A) and C= 5(A—'A). 


Conversely, choosing B and C as in the previous relation, they are symmetric, 
respectively skew-symmetric (by the previous proposition) and they add up to A. 
oO 


Problem 1.60. The matrix A = E | is the difference of a symmetric matrix B 
and of a skew-symmetric matrix C. Find B. 


Solution. By assumption we have A = B —C with B = B and 'C = —C. Thus 
"A='(B-C)='B-‘'C=B+4+C. 
We conclude that 


A+‘'A=(B-C)+(B+C)=2B 
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and so 


oO 


Problem 1.61. a) Let A € M,,(R) be a matrix such that '4- A = O,,. Prove that 
A= Oy. 


b) Does the result in part a) hold if we replace R with C? 


Solution. a) Let A = [Aj;;]. By the product rule for matrices, the (i,i)-entry of 
tA. Ais 


(A: Aji = SoC Aix Ati = > Au 
k=1 


k=1 


Since 'A- A = O,, we conclude that for alli € {1,2,...,2} we have 


` A, =0. 
k=1 


Since the square of a real number is nonnegative, the last equality forces Ag; = 0 
forall k € {1,2,...,n}. Since i was arbitrary, we conclude that A = 0. 

b) The result does no longer hold. Let us look for a symmetric matrix A € M2(C) 
such that ‘A - A = O», that is A? = O3. Since A is symmetric, we can write 


ab 
4= {FG 
for some complex numbers a, b, d. Now 
pale? |.feele a’ +b? b(a+ d) 
ibd bd| |b(a+d) b?+d? | 


So we look for complex numbers a, b,d which are not all equal to 0 and for 
which 


a’+b*=0, biat+d)=0, b?+d?=0. 
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It suffices to ensure that a + d = 0 and a? + b? = 0. For instance, one can take 
a=i,b=1,d=-i. 
oO 


Remark 1.62. We could have also used Problem 1.52 in order to solve part a). 
Indeed, for any X € R” we have ‘X('A-A)X = 0 and so 


n 


X aax Faak Fers F inXn) =0 


i=l 
for any choice of real numbers x1,..., Xn. Since the sum of squares of real numbers 
equals 0 if and only if all these numbers are equal to 0, we deduce that 


aixi +...+din =0 


for alli € {1,2,...,m} and all real numbers x1,...,x,. Thus AX = 0 for all 
X e R” and then A = O,. 


1.6.1 Problems for Practice 


1. Consider the matrices 


[Hh e-i] 


Compute each of the following matrices: 


a) A-'B. 
b) B- 'A. 
c) (A +2'B)(B +2'A). 


2. Let 0 € Rand let 
iz cos — sin ð , 
sinô cos 


a) Prove that A is orthogonal. 
b) Find all values of 6 for which A is symmetric. 
c) Find all values of 0 for which A is skew-symmetric. 


3. Which matrices A € M,,(F2) are the sum of a symmetric matrix and of a skew- 
symmetric matrix? 
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4. 


[00] 


11. 


12. 
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Write the matrix 


123 
A=|234 
345 


as the sum of a symmetric matrix and of a skew-symmetric matrix with real 
entries. 


. All matrices in the following statements are square matrices of the same size. 


Prove that 


a) The product of two symmetric matrices is a symmetric matrix if and only if 
the two matrices commute. 

b) The product of two antisymmetric matrices is a symmetric matrix if and only 
if the two matrices commute. 

c) The product of a symmetric and a skew-symmetric matrix is skew- 
symmetric if and only if the two matrices commute. 


. We have seen that the square of a symmetric matrix A € M,(R) is symmetric. 


Is it true that if the square of a matrix A € M,,(R) is symmetric, then A is 
symmetric? 


. Consider the map g : M3(R) — M3(R) defined by 


p(A) = 'A+2A. 
Prove that ¢ is linear, that is 
g(cA + B) = cg(A) + (B) 


for all A, B € M3(R) and allc € R. 


. Let A € M,,(R) be a matrix such that A- 'A = O,. Prove that A = O,. 
. Find the skew-symmetric matrices A € M, (R) such that A* = O,. 
. Let Aj,..., Ak E M, (R) be matrices such that 


tA Ay +... + Ak: Ap = On. 


Prove that Ay =... = Ák = Oj. 

a) Let A € M3;(R) be a skew-symmetric matrix. Prove that there exists a 
nonzero vector X € R? such that AX = 0. 

b) Does the result in part a) remain true if we replace M3(R) with M>(R)? 

Describe all upper-triangular matrices A € M,,(R) such that 


A:'A='A-A. 


Chapter 2 
Square Matrices of Order 2 


Abstract The main topic of this chapter is a detailed study of 2 x 2 matrices and 
their applications, for instance to linear recursive sequences and Pell’s equations. 
The key ingredient is the Cayley—Hamilton theorem, which is systematically used in 
analyzing the properties of these matrices. Many of these properties will be proved 
in subsequent chapters by more advanced methods. 


Keywords Cayley—-Hamilton ¢ Trace ¢ Determinant ¢ Pell’s equation 
e Binomial equation 


In this chapter we will study some specific problems involving matrices of order 
two and to make things even more concrete, we will work exclusively with matrices 
whose entries are real or complex numbers. The reason for doing this is that in this 
case one can actually perform explicit computations which might help the reader 
become more familiar with the material introduced in the previous chapter. Also, 
many of the results discussed in this chapter in a very special context will later 
on be generalized (very often with completely different methods and tools!). We 
should however warn the reader from the very beginning: studying square matrices 
of order 2 is very far from being trivial, even though it might be tempting to believe 
the contrary. 

A matrix A € M>(C) is scalar if it is of the form zJ for some complex number 
z. One can define the notion of scalar matrix in full generality: if F is a field and 
n > 1, the scalar matrices are precisely the matrices of the form cJ,, where c € F 
is a scalar. 


2.1 The Trace and the Determinant Maps 


We introduce now two fundamental invariants of a 2 x 2 matrix, which will be 
generalized and extensively studied in subsequent chapters for n x n matrices: 
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411 412 


Definition 2.1. Consider a matrix A = | 
a21 422 


| € M2(C). We define 
e the trace of A as 
Tr(A) = ay, + a22. 
e the determinant of A as 
det A = 411422 — 412421. 


We also write 


ai, a 
det A = 11 412 


a21 422 


for the determinant of A. 
We obtain in this way two maps 


Tr, det : M,(C) > C 


which essentially govern the theory of 2 x 2 matrices. The following proposition 
summarizes the main properties of the trace map. The second property is absolutely 
fundamental. Recall that ‘A is the transpose of the matrix A. 


Proposition 2.2. For all matrices A, B € M2(C) and all complex numbers z € C 
we have 


(a) Tr(A + zB) = Tr(A) + zTr(B). 

(b) Tr(AB) = Tr(BA). 

(c) Tr('A) = Tr(A). 

Proof. Properties (a) and (c) are readily checked, so let us focus on property 
(b). Write 


a21 422 bo, ba 


jz E a and B= iB | 


Then 


ARS bee + ai2b21 a11b12 + a] 
az1b11 + anb21 azıbı2 + az22b22 
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and 


BA= ee + biaa; biai + banz | 
baai + baz, baan + bnan. 


Thus 
Tr(AB) = a11b11 + abx + ai2b21 + a21b12 = Tr(BA). 


E 


Remark 2.3. The map Tr : M2(C) — C is not multiplicative, i.e., generally 
Tr(AB) Æ Tr(A)Tr(B). For instance Tr(/2- D) = TrU/2) = 2 and Tr(/2)-Tr(/2) = 
2:2=442. 


Let us turn now to properties of the determinant map: 


Proposition 2.4. For all matrices A, B € M2(C) and all complex numbers a we 
have 

(1) det(AB) = det A - det B; 

(2) det tA = det A; 

(3) det(wA) = a? det A. 

Proof. Properties (2) and (3) follow readily from the definition of a determinant. 
Property (1) will be checked by a painful direct computation. Let 


E g 


AB = ax+bzay+bt 
cx+dzcy+dt 


Then 


and so 
det(AB) = (ax + bz)(cy + dt) — (cx + dz)(ay + bt) = 
acxy + adxt + bcyz + bdzt —acxy — bext —adyz—bdz = 


xt(ad — bc) — yz(ad — bc) = (ad — bc) (xt — yz) = det A - det B, 


as desired. O 


Problem 2.5. Let A € M2(R) such that 


det(A + 2h) = det(A — D). 
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Prove that 


det(A + h2) = det(A). 


Solution. Write A = f : | The condition becomes 
c 


c d+2 


a+2 b |= 


or equivalently 
(a+2)(d +2)—bc = (a — 1)(d — 1) — be. 
Expanding and canceling similar terms, we obtain the equivalent relation a+d =—1. 


Using similar arguments, the equality det(A + J.) = det A is equivalent to (a + 
1)(d + 1) —be = ad — bc, ora +d = —1.The result follows. Oo 


2.1.1 Problems for Practice 


1. Compute the trace and the determinant of the following matrices 


E fs) ef) 


2. Let A = p T Compute the determinant of A’. 
3. The trace of A € M3 (C) equals 0. Prove that the trace of A? also equals 0. 


4. Prove that for all matrices A € M2(C) we have 


_ (Tr(A))? — Tr(A*) 
= > , 


det A 


5. Prove that for all matrices A, B € M3 (C) we have 
det(A + B) = det A + det B + Tr(A) - Tr(B) — Tr(A B). 


6. Let f : Ma(C) — C be a map with the property that for all matrices A, B € 
M>(C) and all complex numbers z we have 


f(A+zB) = f(A) +zf(B) and f(AB) = f(BA). 
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(a) Consider the matrices 


10 01 00 00 
Ey, = . Ep= , En= , En= 
»=[ool esoh =*=[t0l/ =2=[o1] 


and define x;; = f(E;j). Check that E12 E21 = Ey, and Ey, E2 = En, and 
deduce that xj; = X29. 

(b) Check that E1 E12 = Ej. and E,.E,; = O, and deduce that x;. = 0. 
Using a similar argument, prove that x; = 0. 

(c) Conclude that there is a complex number c such that 


f(A) =c- Tr(A) 


for all matrices A. 


2.2 The Characteristic Polynomial and the Cayley—Hamilton 
Theorem 


Let A e M)(C). The characteristic polynomial of A is by definition the 
polynomial denoted det(X J, — A) and defined by 


det(X I, — A) = X? — Tr(A)X + det A. 


We note straight away that AB and BA have the same characteristic polynomial 
for all matrices A,B € M,(C), since AB and BA have the same trace and the 
same determinant, by results established in the previous section. In particular, if P 
is invertible, then A and PAP! have the same characteristic polynomial. 

The notation det(X J, — A) is rather suggestive, and it is indeed coherent, in the 
sense that for any complex number z, if we evaluate the characteristic polynomial of 
A at z, we obtain precisely the determinant of the matrix zl — A. More generally, 
we have the following very useful: 


Problem 2.6. For any two matrices A, B € M2(C) there is a complex number u 
such that 


det(A + zB) = det A + uz + det B- 2 


for all complex numbers z. If A, B have integer/rational/real entries, then u is 
integer/rational/real. 
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Solution. Write A = f J and B = F B | Then 
cd y 6 


at+zab+zp 


Tl ed cae ay 


(a+za)(d +28)—(b+zB)(c+zy) = z? (ad—By)+2(ad+da—fBc—yb)+ad—be. 


Since a6 — By = det B and ad — bc = det A, the result follows. 


oO 


In other words, for any two matrices A, B € M>(C) we can define a quadratic 
polynomial det(A + XB) which evaluated at any complex number z gives det(A + 
zB). Moreover, det(A + XB) has constant term det A and leading term B, and if 
A, B have rational/integer/real entries, then this polynomial has rational/integer/real 
coefficients. Before moving on, let us practice some problems to better digest these 


ideas. 


Problem 2.7. Let U, V € M2(R). Using the polynomial det(U + XV), prove that 


det(U + V) + det(U — V) = 2detU + 2det V. 
Solution. Write 
f(X) = det(XV + U) = det V - X? + mX + det U, 
for some m € R. Then 


det(U + V) + det(U —V) = f() + f1) = 


(det V + m + det U) + (det V — m + det U) = 2 (det U + det V). 


Problem 2.8. Let A, B € M2(R). Using the previous problem, prove that 
det(A? + B*) + det(AB + BA) > 0. 
Solution. As suggested, we use the identity 
det(U + V) + det(U — V) = 2det U + 2 det V. 
from Problem 2.7, and take U = A? + B?, V = AB + BA. Thus 


det(4? + B? + AB + BA) + det(A? + B? — AB — BA) 
= 2det(A? + B*) + 2det(AB + BA). 
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As A? + B? + AB + BA = (A + B} and A? + B? — AB — BA = (A — B)*, we 
obtain 


2 det(A? + B?) + 2det(AB + BA) = det(A + B} + det(A — B} > 0. 


Problem 2.9. Let A, B € M2(R). Using the polynomial 
f(X) = det( + AB + x(BA — AB)), 
prove that 


2AB +3BA 3AB +2BA 
de (n + EEA) = der (n + EAN 


5 5 


Solution. As suggested, consider the polynomial of degree at most 2 
f(X) = det(h + AB + x(BA — AB)). 


2 3 
We need to prove that f oi f z) We claim that f(X) = f (1 — X), which 


clearly implies the desired result. The polynomial g(X) = f(X) — f(1 — X) has 
degree at most 1 and satisfies g (0) = g(1) = 0. Indeed, we have 


g(0) = f0) — fC) = det, + AB) — detU2 + BA) = 0, 


since AB and BA have the same characteristic polynomial. Also, g(1) = f(1) — 
f (0) = 0. Thus g must be the zero polynomial and the result follows. Oo 


We introduce now another crucial tool in the theory of matrices, which will 
be vastly generalized in subsequent chapters to n x n matrices (using completely 
different ideas and techniques). 


Definition 2.10. The eigenvalues of a matrix A € M2(C) are the roots of its 
characteristic polynomial, in other words they are the complex solutions A1, A2 of 
the equation 
det(t I — A) = t? —Tr(A)t + det A = 0. 
Note that 


Ay +A.=Tr(A) and A,Az = det A, 


i.e., the trace is the sum of the eigenvalues and the determinant is the product 
of the eigenvalues. Indeed, by definition of A; and A, the characteristic polynomial 
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is (X — A,)(X¥ — A.), and identifying the coefficients of X and X? = 1 yields the 
desired relations. 

The following result is absolutely fundamental for the study of square 
matrices of order 2. 


Theorem 2.11 (Cayley—Hamilton). For any A € M2(C) we have 


A? —Tr(A)- A + (det A) - h = Op. 
Proof. Write A = f A then a direct computation shows that 
c 


2_ | a+bc b(a+d) 
~ Le(atd) d*+be |’ 
Letting x = Tr(A), we obtain 


2 
A —TH(A) A+ (det): p= |" +be bx ele | 


cx d*+be cx dx 
,[ad—be 0 _ [a?+ad -ax 0 = 
0 ad-bc| 0 d*+ad—dx| ”’ 


since a*+ad —ax=a(a+d—x)=0 and similarly d? +ad —dx=d(a+d—x)=0. 
Oo 


Remark 2.12. (a) In other words, the matrix A is a solution of the characteristic 
equation 


det(tI, — A) = t? — Tr(A)t + det A = 0. 


(b) Expressed in terms of the eigenvalues 4; and Az of A, the Cayley—Hamilton 
theorem can be written either 


A? — (Ay FADA+AA2: b = On (2.1) 
or equivalently 
(A—A, + h)\(A- %2: In) = 02. (2.2) 


Both relations are extremely useful when dealing with square matrices of order 2, 
and we will see many applications in subsequent sections. 
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Problem 2.13. Let A € M2(C) have eigenvalues A; and 13. Prove that for alln > 1 
we have 


Tr(A") = A? + A2. 


Deduce that A} and A3 are the eigenvalues of A”. 


Solution. Let x, = Tr(A”). Multiplying relation (2.1) by A” and taking the 
trace yields 


Xn+2 — (Ay + A2)Xn+1 = A1A2Xn =0. 


Since x) = 2 and xı = Tr(A) = A; + Ao, an immediate induction shows that 
Xn = Ai + 4; for all n. 

For the second part, let z1, z2 be the eigenvalues of A”. By definition, they are the 
solutions of the equation t? — Tr(A”)t + det(A”) = 0. Since det(A”) = (det A)” = 
AiA5 and Tr(A”) = A} + A5, the previous equation is equivalent to 


t? — (A! + ALt HALAL =0 or (t—A)(t—AX) = 0. 


The result follows. O 


Problem 2.14. Let A € M2(C) be a matrix with Tr(A) 4 0. Prove that a matrix 
B € M,(C) commutes with A if and only if B commutes with A?. 


Solution. Clearly, if BA = AB, then BA? = A?B, so assume conversely that 
BA? = A’B. Using the Cayley—Hamilton theorem, we can write this relation as 


B(Tr(A)A — det A - D) = (Tr(A)A — det A - D) B 
or 
Tr(A)(BA — AB) = Op. 


Since Tr(A) 4 0, we obtain BA = AB, as desired. Oo 


Problem 2.15. Prove that for any matrices A, B € M>(R) there is a real number a 
such that (AB — BA)? = ah. 


Solution. Let X = AB — BA. Since Tr(X) = Tr(AB) —Tr(BA) = 0, the Cayley- 
Hamilton theorem yields X? = — det XJ, and so we can take œ = — det X. E 


Problem 2.16. Let X € M2(R) be a matrix such that det(X? + Jy) = 0. Prove that 
xX? + h = 02. 


Solution. We have det(X +i 2) = 0 or det(X —i I) = 0, and since det(X — i h2) = 


det(X + iJ), we deduce that det(X + ih) = 0 = det(X — i lh). If X = f ak 
c 
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the relation det(X + il) = 0 is equivalent to (a + i)(d +i) — bc = 0, i.e., 
ad — bc = l anda + d = 0. Thus det X = 1 and Tr(X) = 0 and we conclude 
using the Cayley—Hamilton theorem. Oo 


An important consequence of the Cayley—Hamilton theorem is the following 
result (which can of course be proved directly by hand). 


Theorem 2.17. A matrix A € M2(C) is invertible if and only if det A F 0. If this is 
the case, then 


1 
A~! = ——(Tr(A)- h — A). 
aa! 2— A) 


Proof. Suppose that A is invertible. Then taking the determinant in the equality 
A- AT! = h, we obtain 


det A- det A7! = det h = 1, 


thus det A Æ 0. 
Conversely, suppose that det A Æ 0 and define 


1 
B = — (Tr(A): h — A). 
det A TAI ) 
Then using the Cayley—Hamilton theorem we obtain 


1 1 
AB = ——(Tr(A)- A — A”) = —— -det Ah = I 
A HA) ) det A Š 2 $ 


and similarly BA = Jy. Thus A is invertible and A~! = B. Oo 


Remark 2.18. One can also check directly that if det A Æ 0, then A is invertible, its 
inverse being given by 


Arve l a422 —a12 
det A | —az;) a11 : 


Problem 2.19. Let A, B € M(C) be two matrices such that AB = J). Then A is 
invertible and B = A7!. In particular, we have BA = Ih. 


Solution. Since AB = Jy, we have det A - det B = det(AB) = 1, thus det A Æ 0. 
The previous theorem shows that A is invertible. Multiplying the equality AB = J, 
by A`! on the left, we obtain B = A7!. Finally, BA = AT! A = h. o 


A very important consequence of the previous theorem is the following charac- 
terization of eigenvalues: 


Theorem 2.20. If A € M2(C) and z € C, then the following assertions are 
equivalent: 
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(a) zis an eigenvalue of A; 
(b) det(zh — A) = 0. 
(c) There is a nonzero vector v € C? such that Av = zv. 


Proof. By definition of eigenvalues, (a) implies (b). Suppose that (b) holds and let 
B = A — zh. The assumption implies that det B = 0 and so b11b22 = bj2b2). We 
need to prove that we can find x1, x2 € C not both zero and such that 


bi1xı + bi2xX2 =0 and b21 X1 + ba2xX2 =0. 


If bj; Æ 0 or bi2 Æ 0, choose x2 = by, and x; = —b12, so suppose that bı; = 0 = 
bı2. If one of b21, b22 is nonzero, choose x} = —bz and x2 = b21, otherwise choose 
xı = X2 = 1. Thus (b) implies (c). 

Suppose now that (c) holds. Then A?v = zAv = 2v and using relation (2.1) we 
obtain 


(2? — Tr(A)z + det A)v = 0. 


Since v Æ 0, this forces z? — Tr(A)z + det A = 0 and so z is an eigenvalue of A. 
Thus (c) implies (a) and the theorem is proved. Oo 


Problem 2.21. Let A € M>(C) have two distinct eigenvalues A;, A2. Prove that we 
can find an invertible matrix P € GL2(C) such that 


‘ ; x 
Solution. By the previous theorem, we can find two nonzero vectors X; = | tl | 
X21 


and X> = H such that AX; = A,X). 
X22 


X11 X12 


Consider the matrix P = | | whose columns are X1, X2. A simple 


X21 X22 
computation shows that the columns of AP are A,X, and 2X32, which are the 


columns of P k 1 | thus AP = P F 0 
0 Az 0 Az 

then P is invertible (we haven’t used so far the hypothesis A; 4 A2). 
Suppose that det P = 0, thus x1;X22 = X21Xx12. This easily implies that the 
columns of P are proportional, say the second column X3 is a times the first column, 


X,. Thus X, = aX,. Then 


| It remains to see that if A; Æ Ao, 


À2X2 = AX) = aAX, = aa, X, = A,X, 


forcing (Ay — A2)X2 = 0. This is impossible as both A; — A2 and X3 are nonzero. 
The problem is solved. Oo 


64 2 Square Matrices of Order 2 


Problem 2.22. Solve in M2(C) the following equations 


(a) A2 = Op. 
b) A2 = bh. 
(c) Æ =A. 


Solution. (a) Let A be a solution of the problem. Then det A = 0 and the Cayley- 
Hamilton relation reduces to Tr(A)A = 0. Taking the trace yields Tr(A)* = 0, 
thus Tr(A) = 0. Conversely, if det A = 0 and Tr(A) = 0, then the Cayley— 
Hamilton theorem shows that A? = O3. Thus the solutions of the problem are 
the matrices 


A= f N with a,b,ceC and a? +bc=0. 

(b) We must have detA = +1 and, by the Cayley-Hamilton theorem, J, — 
Tr(A)A + det Aly = Op. If det A = 1, then Tr(A)A = 2h and taking the trace 
yields Tr(A)? = 4, thus Tr(A) = +2. This yields two solutions, A = +I. 
Suppose that det A = —1. Then Tr(A)A = O, and taking the trace gives 
Tr(A) = 0. Conversely, any matrix A with Tr(A) = 0 and det A = —lisa 
solution of the problem (again by Cayley—Hamilton). Thus the solutions of the 
equation are 


+l and asle af a,b,cE€EC, art+be= 


(c) If det A Æ 0, then multiplying by AT! yields A = J. So suppose that det A = 
0. The Cayley—Hamilton theorem yields A — Tr(A)A = O2. If Tr(A) Æ 1, this 
forces A = Op, which is a solution of the problem. Thus if A # Op, h, then 
det A = 0 and Tr(A) = 1. Conversely, all such matrices are solutions of the 
problem (again by Cayley—Hamilton). Thus the solutions of the problem are 


a b 


O2, h and a=] 
cl-a 


IF a,b,cEC, avt+be=a. 


oO 


Problem 2.23. Let A € M2(C) be a matrix. Prove that the following statements are 
equivalent: 


(a) Tr(A) = det A = 0. 

(b) A? = Op. 

(c) Tr(A) = Tr(A”) = 0. 

(d) There exists n > 2 such that A” = Op. 


Solution. Taking the trace of the Cayley—-Hamilton theorem, we see that Tr(A”) = 
Tr(A)? — 2 det A. From this it is clear that (a) and (c) are equivalent. 
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The implication (a) implies (b) is just an application of the Cayley—Hamilton 
theorem. The implication (b) implies (d) is obvious. Thus we need only show (d) 
implies (a). If A” = O% for some n > 2, then clearly det A = 0. Thus the Cayley— 
Hamilton theorem reads A? = Tr(A)A. Iterating this an immediate induction gives 
A” = Tr(A)""!A, hence O) = Tr(A)”"—!A. Taking the trace of this identity gives 
0 = Tr(A)” and hence Tr(A) = 0. Oo 


Problem 2.24. Find all matrices X € M>(R) such that X? = J). 


Solution. We must have (det X) = 1 and so det X = 1 (since det X € R). Letting 
t = Tr(X), the Cayley—Hamilton theorem and the given equation yield 


b = X? = X(tX — h) = t(tX — bh) — X = (t -1)X - th. 


If £? Æ 1, then the previous relation shows that X is scalar and since X? = Jy, we 
must have X = J). If t? = 1, then the previous relation gives £ = —1. Conversely, 
any matrix X € M>(R) with Tr(X) = —1 and det X = 1 satisfies X°+X +I, = Op 
and so also X? = Jy. We conclude that the solutions of the problem are 


a b 


I, and | 
c—l—a 


IF a,b,cER, a@t+atbe=-l. 


2.2.1 Problems for Practice 


1. Let A, B € M)(R) be commuting matrices. Prove that 
det(A* + B°) > 0. 
Hint: check that A? + B? = (A + iB)(A — iB). 

2. Let A,B € M)(R) be such that AB = BA and det(A? + B?) = 0. Prove 
that det A = det B. Hint: use the hint of the previous problem and consider the 
polynomial det(A + XB). 

3. Let A, B,C € M>(R) be pairwise commuting matrices and let 


f(X) = det(A? + B? + C? + x(AB + BA + CA)). 


(a) Prove that f (2) > 0. Hint: check that 


A? + B? + C? +2(AB + BA+ CA) =(A+ B+C)’. 
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(b) Prove that f(—1) > 0. Hint: denote X = A — Band Y = B — C and 
check that 


2 
1 2 
44B? +C? (AB + BC +A) =(X4 57) + (Fy) 


Next use the first problem. 
(c) Deduce that 


det(A? + B? + C?) + 2det(AB + BC + CA) > 0. 


. Let A, B € M>(C) be matrices with Tr(AB) = 0. Prove that (AB)* = (BA)?. 


Hint: use the Cayley—Hamilton theorem. 


. Let A be a2 x 2 matrix with rational entries with the property that 


det(A? — 2/5) = 0. 


Prove that A? = 2/5 and det A = —2. Hint: use the fact that A? — 2], = 
(A — V215)(A + v2h) and consider the characteristic polynomial of A. 


. Let x be a positive real number and let A € M2(R) be a matrix such that 


det(A? + x12) = 0. Prove that 


det(A? + A+ xh) =x. 


. Let A, B € M)(R) be such that det(A B — BA) < 0. Consider the polynomial 


f(X) = det(h + (1 — X)AB + XBA). 


(a) Prove that f (0) = f(1). 
(b) Deduce that 


1 
det(I, + AB) < det (1 + 5(AB + BA) 


. Letn > 3 be an integer. Let X € M)(R) be such that 


(a) Prove that det X = 0. Hint: show that det(X* + Ih) = 0. 
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(b) Let t be the trace of X . Prove that 


1" J pr? = De 


(c) Find all possible matrices X satisfying the original equation. 


9. Letn > 2 bea positive integer and let A, B € M>(C) be two matrices such that 
AB + BA and (AB)" = (BA)". Prove that (AB)” = æl for some complex 
number g. 

10. Let A, B e M>(R) and letn > 1 be an integer such that C” = Jy, where 
C = AB — BA. Prove that n is even and C+ = J). Hint: use Problem 2.15. 


2.3 The Powers of a Square Matrix of Order 2 


In this section we will use the Cayley—Hamilton theorem to compute the powers of a 
given matrix A € M>(C). Let A, and A> be the eigenvalues of A. The discussion and 
the final result will be very different according to whether A, and A, are different 
or not. 

Let us start with the case 4; = A> and consider the matrix B = A — A, I>. Then 
the Cayley—Hamilton theorem in the form of relation (2.2) yields B? = O», thus 
B* = Oz fork > 2. Using the binomial formula we obtain 


n n 
A" = (B+ =} (jare = Athy + nåt B. 
k=0 


Let us assume now that A, Æ A2 and consider the matrices 
B= A-A\bh and C= A— àh. 


Relation (2.2) becomes BC = Oy, or equivalently B(A — 212) = O2. Thus BA = 
AB, which yields BA? = A,BA = ASB and by an immediate induction BA” = 
A3,B for all n. Similarly, the relation BC = O yields CA” = A{C for all n. Taking 
advantage of the relation C — B = (A; — 2) l2, we obtain 


(A, —A2)A” = (C — B) A" = CA" — BA" = "C — AEB. 


Thus 


1 


A” = 
Ài — àz 


(°C — 12B). 


All in all, we proved the following useful result, in which we change notations: 
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Theorem 2.25. Let A € M)(C) and let 41, àn be its eigenvalues. 
(a) If ài Æ Ad, then for alln > 1 we have A” = 11 B + A5C, where 


B= 


(A = à2h) and C = 


ese (A=A,b). 


d2 — A, 
(b) If ài = àn, then for alln > 1 we have A” = B+ nat IC, where B= h 
and C = A— àh. 


Problem 2.26. Compute A”, where A = | l J : 


Solution. As Tr(A) = —4 and det A = 4, the eigenvalues of A are solutions of the 
equation t? + 4t + 4 = 0, thus 4; = A, = —2 are the eigenvalues of A. Using the 
previous theorem, we conclude that for any n > 1 we have 
At = (—2)"b 4s n(—2)""1(A + 2h) = (=2)"7! 3n —2 3n 
—3n —3n—2 


oO 


Though the exact statement of the previous theorem is a little cumbersome, the 
basic idea is very simple. If one learns this idea, then one can compute A” easily. 
Keep in mind that when computing powers of a 2x2 matrix, one starts by computing 
the eigenvalues of the matrix (this comes down to solving the quadratic equation 
t? — Tr(A)t + det A = 0). If the eigenvalues are equal, say both equal to A, then 
B := A—Xh satisfies B? = O2 and so one computes A” by writing A = B + Alb 
and using the binomial formula. On the other hand, if the eigenvalues are different, 
say A; and Aj, then there are two matrices B, C such that for all n we have 


A" = MB + ARC. 


One can easily find these matrices without having to learn the formulae by heart: 
if the previous relation holds for all n > 0, then it certainly holds for n = 0 and 
n = 1. Thus 


h=B+C, A=A,\B4+A.C. 


This immediately yields the matrices B and C in terms of h, A and A1,A2. 
Moreover, we see that they are of the form xJ, + yA for some complex numbers 
x, y. Combining this observation with Theorem 2.25 yields the following useful 


Corollary 2.27. For any matrix A € M>(C) there are sequences (Xn)n>0, (Yn)n>0 
of complex numbers such that 


A” = x,A+ Val 


for alin = 0. 
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One has to be careful that the sequences (x;,),>0 and (Yn)n>0 in the previous 
corollary are definitely not always characterized by the equality A” = x, A+ yal 
(this is however the case if A is not scalar). On the other hand, Theorem 2.25 shows 
that we can take 


n n 
Ai — a3 


B AAR = aA 
fede 


dy, = 
and y kek 


Xn 


when A; Æ Az, and, when A, = Az 
x = mae and y, = —(n — 1)A}. 


Problem 2.28. Let m,n be positive integers and let A, B € M2(C) be two matrices 
such that A” B” = B” A”. If A” and B” are not scalar, prove that AB = BA. 


Solution. From Corollary 2.27 we have 
A‘ = x, A+ ygly and BY = uB + vrh, k>=0, 
where (xx)e>0, (Ve e>0 (Uk)k>0; Vk)k>0 are Sequences of complex numbers. Since 


A” and B” are not scalar matrices it follows that xm 4 0 and u, Æ 0. The relation 
A” B” = B” A” is equivalent to 


(XmA + Ym12) (Un B + Vn I2) = (u,B + Vn lo) (Xm A + Ym 12) 


Xmln(AB — BA) = Oo. 


Hence AB = BA. o 
Problem 2.29. Let t € R and let 


cost —sint 
A; = X š 
sint cost 


Compute A” forn > 1. 


Solution. We offer three ways to solve this problem. The first is to follow the usual 
procedure: compute the eigenvalues of A, and then use the general Theorem 2.25. 
Here the eigenvalues are e’’ and e~"’ and it is not difficult to deduce that 


A= cosnt — sinnt 
p= : 5 
sinnt cosnt 
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Another argument is as follows: an explicit computation shows that An+n = 
A; Án, thus 


A; = A; g A; Tet A; = Arttt.+t = Ant: 


Finally, one can also argue geometrically: A; is the matrix of a rotation of angle 
t, thus A? is the matrix of a rotation of angle nt. oO 


2.3.1 Problems for Practice 


E] 


(a) Let n be a positive integer. Prove the existence of a unique pair of integers 
(Xn, Yn) such that 


1. Consider the matrix 


A” = x,A+ Yah. 


(b) Compute lim, nae 


2. Given a positive integer n, compute the nth power of the matrix 


3. Let a, b be real numbers and let n be a positive integer. Compute the nth power 
of the matrix l ; i 
Oa 


4. Let x be a real number and let 


A= cos x + sinx 2 sinx 
—sinx cosx—sinx |` 


Compute A” for all positive integers n. 


2.4 Application to Linear Recurrences 


In this section we present two classical applications of the theory developed in 
the previous section. Let a,b,c,d, xo, yo be complex numbers and consider two 
sequences (X,)n>09 and (V,)n>0 recursively defined by 
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(2.3) 


Xn+1 = AXn + byy 
Yn+1 = CXn +dy,, n=O 
We would like to find the general terms of the two sequences in terms of the initial 
data a,b,c,d, xo, yo and n. 
The key observation is that the system can be written in matrix form as follows 


[st] =| ll fe ie. [att |= al * |. n= 0, 
Yn+ı cd|| Yn Yn+ı Yn 


where A = j | is the matrix of coefficients. An immediate induction yields 
c 


[>] = An | > ee (2.4) 
Yn Yo 


and so the problem is reduced to the computation of A”, which is solved by 
Theorem 2.25. 

Let us consider a slightly different problem, also very classical. It concerns 
second order linear recurrences with constant coefficients. More precisely, we fix 
complex numbers a, b, xo, x; and look for the general term of the sequence (x,),>0 
defined recursively by 


Xnt1 =X, +bx%y-1, n>, (2.5) 
We can easily reduce this problem to the previous one by denoting yn = x,—, for 
n > land yo = —(x, — axo) if b Æ 0 (which we will assume from now on, since 


otherwise the problem is very simple from the very beginning). Indeed, relation 
(2.5) is equivalent to the following system 


n> 0. 


’ = 


Xn+1 = aXn + byn 
Yn+1 = Xn 


As we have already seen, finding x, and y, (or equivalently x,) comes down to 
computing the powers of the matrix of coefficients 


ab 
A= i 
[to] 
The characteristic equation of this matrix is A? — ad — b = 0. If A, and A, are the 
roots of this equation, then Theorem 2.25 yields the following: 
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e If à; Æ Az, then we can find constants u, v such that 
Xn = UÀ] + và} 
for all n. These two constants are easily determined by imposing 
Xo=u+v and xı = uà; + vàz 


and solving this linear system in the unknowns u, v. 
e If Ay = Ao, then we can find constants u, v such that for all n > 0 


Xn = (un + v)À}, 


and u and v are found from the initial conditions by solving xọ = v and x; = 
(u + v)àı. 


Problem 2.30. Find the general terms of (xn )n>0, (Yn)n>0 if 


Xn+1 = Xn + 2Yn 
Yn+1 = —2Xn =F 5Yn, n= 0, 


and xo = 1, yo = 2. 


Solution. The matrix of coefficients is A = | , with characteristic equation 


—25 
A? — 6A + 9 = 0 and solutions A; = Az = 3. Theorem 2.25 yields (after a simple 
computation) 


A’ = 3” h + n3”! ie | = ie = 2n)3"7! 2n3""! | 


-22 |T | -2n3"-! (3 +2n)3"7! 
Combined with xọ = 1 and yọ = 2, we obtain 


Xn = (2n + 3)3”! and y, = 2(n + 3)3" 1, n> 0. 


Problem 2.31. Find the limits of sequences (xn)n>0 and (Yn)n>0, Where 


Xn+1 = el =Q) Xa + an 
Yn+1 = BXn +- B)Yn, 


and a, B are complex numbers with |1 — œ — f| < 1. 
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Solution. The matrix of coefficients is 
l-a a 
P 1-8 


and one easily checks that its eigenvalues are A; = 1 and A, = 1—a — p. Note that 
|A2| < 1, in particular A, Æ 1. Letting 


= 1 _ _ 1 Ba 
B= a-a = [82]. 


Theorem 2.25 gives the existence of an explicit matrix C such that 
A" =MB+A5C = B+ASC. 


Since |2| < 1, we have lim,—..45 = 0 and the previous relation shows that 
lim, >œ A” = B. 


; x X 

Since | "| = A” | I | we conclude that x, and y, are convergent sequences, 
Yn Yo 

and if /,, /> are their limits, then 


Taking into account the explicit form of B, we obtain 


Bxo + yo 


lim x, = lim = 
noo" n—>oo Yn at B 


2.4.1 Problems for Practice 


1. Find the general term of the sequence (x, )n>0 defined by xı = 1, x2 = 0 and for 
alln > 1 


Xn4+2 = 4Xn+1 — Xn. 
2. Consider the sequence (x, )n>0 defined by x9 = 1, xı = 2 and for all n > 0 
Xn+2 = Xn+1 — Xn- 


Is this sequence periodical? If so, find its minimal period. 


74 2 Square Matrices of Order 2 


3. Find the general terms of the sequences (Xn)n>0 and (Yn)n>0 Satisfying xo = 
yo = 1, xı = 1, yı = 2 and 


2Xn + 3Yn 2Yn + 3Xn 
Xn+1 = 2 ye Yn+1 = Te N 


4. A sequence (Xn)n>0 satisfies x9 = 2, x; = 3 and for all n > 1 


Xn+1 = VXn-1Xn- 


Find the general term of this sequence (hint: take the logarithm of the recurrence 
relation). 
5. Consider a map f : (0,00) —> (0, 00) such that 


IE) = 6x — f(x) 


for all x > 0. Let x > 0 and define a sequence (Zn)n>0 by zo = x and Zn41 = 
Fn) forn > 0. 


(a) Prove that 
Zn+2 + Zn+1 — 62n =0 


forn > 0. 
(b) Deduce the existence of real numbers a, b such that 


Zn =a: 2?" +b. (—3)" 
forall n > 0. 


(c) Using the fact that z, > 0 for all n, prove that b = O and conclude that 
f(x) = 2x for all x > 0. 


2.5 Solving the Equation X” = A 


Consider a matrix A € M>(C) and an integer n > 1. In this section we will explain 
how to solve the equation X” = A, with X € M)(C). 
A first key observation is that for any solution X of the equation we have 


AX = XA. 


Indeed, this is simply a consequence of the fact that X” - X = X - X”. We will need 
the following very useful: 
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Proposition 2.32. Let A € Mb (C) be a non-scalar matrix. If X € M(C) 
commutes with A, then X = al, + BA for some complex numbers a, p. 


ab 
cd 


after an elementary computation, to 


Proof. Write A = | | and X = rE l The equality AX = XA is equivalent, 
z 


bz=cy, ay+bt=bx+dy, cx+dz=az+ct, 
or 


bz=cy, (a-—d)y=b(x-t), c(x—-t)=2z(a—d). 


Ifa A d, set p = +5. Then z = cB, y = bf and Ba — x = Bd — t. We deduce 
that X = al, + BA, wherea = —Ba+x=—fd +t. 

Suppose that a = d. If x Æ t, the previous relations yield b = c = 0 and so A is 
scalar, a contradiction. Thus x = ¢ and bz = cy. Moreover, one of b, c is nonzero 
(as A is not scalar), say b (the argument when c ¥ 0 is identical). Setting B = + 
anda = x — Ba yields X = al, + BA. 

Oo 


Let us come back to our original problem, solving the equation X” = A. Let A, 
and À, be the eigenvalues of A. We will discuss several cases, each of them having 
a very different behavior. 

Let us start with the case A; 4 A>. By Problem 2.21, we can then write A = 


P i | P—! for some P € GL2(C). Since AX = XA and A is not scalar, by 
2 


Proposition 2.32 there are complex numbers a, b such that X = al + bA. Thus 


The equation X” = A is then equivalent to 


(a + bà)" 0 = Ài 0 
0 (a + bi)" E 0 Ar f 


It follows that a + bà; = zı and a + bàz = z2, where z} = A; and z3 = Ag, and 


zı 0 


X=P | | P—!. Hence 
0z 


Proposition 2.33. Let A € M3 (C) be a matrix with distinct eigenvalues i, 2. Let 


P €e GL2(C) be a matrix such that A = P k o 


P—!. Then the solutions of the 
0 A2 
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equation X” = A are given by X = P E i | P—!, where zı and zz are solutions 
z2 
of the equations t” = À; and t” = iz respectively. 


Let us deal now with the case in which A is not scalar, but has equal eigenvalues, 
say both eigenvalues equal 4. Then the matrix B = A — àh satisfies B? = O; (by 
the Cayley—Hamilton theorem) and we have A = B + Aly. Now, since AX = XA 
and A is not scalar, we can write X = cl, + dA for some complex numbers c, d 
(Proposition 2.32). Since A = B + Ah, it follows that we can also write X = 
al, + bB for some complex numbers a, b. Since B? = On, the binomial formula 
and the given equation yield 


A= X” = (ah + bB)” = a" h, + na" dB. 
Since A = B + àh, we obtain 
B +h = na"™!bB +a"h. 
Since B is not scalar (as A itself is not scalar), the previous relation is equivalent to 
l=na"'b and à =a". 


This already shows that A Æ 0 (as the first equation shows that a Æ 0), so if à = 0 
(which corresponds to A? = O2) then the equation has no solution. On the other 
hand, if A Æ 0, then the equation a” = A has n complex solutions, and for each of 
them we obtain a unique value of b, namely b = st We have just proved the 


1a”! * 
following 


Proposition 2.34. Suppose that A € M2(C) is not scalar, but both eigenvalues of 
A are equal to some complex number à. Then 


(a) Ifà = 0, the equation X” = A has no solutions for n > 1, and the only solution 
X = Aforn=l1. 
(b) Ifà F 0, then the solutions of the equation X” = A are given by 


1 


x = aly T na"! 


(A—Ah), 


where a runs over the n solutions of the equation z" = À. 


Finally, let us deal with the case when A is scalar, say A = c In for some complex 
number c. If c = 0, then X” = Oy has already been solved, so let us assume that 
c Æ 0. Consider a solution X of the equation X” = ch and let Aj, Àz be the 
eigenvalues of X. Then AY = 45 = c. We have two possibilities: 
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° Either A, Æ A», in which case X has distinct eigenvalues and so (Problem 2.21) 
A; 0 


we can write X = P | 
2 


| P—! for some invertible matrix P. Then X” = 

AiO it : : i P sias 

P 0 4 P~ and this equals cJy since Af = A> = c. The conclusion is that 
2 

for each pair (A1, A2) of distinct solutions of the equation t” = c we obtain a 

A, 0 


Po! fi 
A | or some 


whole family of solutions, namely the matrices X = P | 
2 


invertible matrix P. 

e Suppose that A; = Az and let Y = X — åı h, then Y? = O; and the equation 
X” = ch is equivalent to (Y + A1 2)” = ch. Using again the binomial formula 
and the equality Y? = O3, we can rewrite this equation as 


Ath +n Y =ch. 


Since AJ = c and A, Æ 0 (as c Æ 0), we deduce that necessarily Y = O, and so 
X = àı h, with A, one of the n complex solutions of the equation t” = c. Thus 
we obtain n more solutions this way. 


We can unify the previous two possibilities and obtain 


Proposition 2.35. If c 4 0 is a complex number, the solutions in Mz(C) of the 
equation X” = cI are given by 


x= [50] Po (2.6) 


where x, y are solutions (not necessary distinct) of the equation z” = c, and P € 
GL,(C) is arbitrary. 


Problem 2.36. Let ¢ € (0, 7) be areal number and let n > 1 be an integer. Find all 
matrices X € M>(R) such that 


yn pa E 
sint cost 
Solution. With the notations of Problem 2.29, we need to solve the equation X” = 
Ar. Let X be a solution, then XA, = A,X = X"*!, Writing X = f ak the 
c 


relation XA; = A,X yields b sint = —c sint and —a sint = —d sint, thus a = d 


and c = —b. Hence X = Į ai Next, since X” = A;, we have 
a 


(det X)” = det X” = det A; = 1, 
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and since detX = a? + b? > 0, we deduce that a? + b? = 1. Thus we can 
write a = cosx and b = sinx for some real number x. Then X = A, and the 
equation X” = A, is equivalent (thanks to Problem 2.29) to A,, = Az. This is 
further equivalent to nx = t + 2ka for some integer k. It is enough to restrict to 


k € {0,1,...,2—1}. We conclude that the solutions of the problem are the matrices 
y= bee Pay , 
sin ík COS tk 
t + 2k 
vies Se PS. et Oo 
n 


Problem 2.37. Let A = l a| € M,(R). Prove that the following statements 
a 
are equivalent: 


(1) A” = h for some positive integer n; 
(2) a =cosrz, b = sinrz for some rational number r. 


Solution. If a = cos(£zr) and b = sin(<z) for some n > 1 and k € Z, then 
Problem 2.29 yields A?” = Jy, thus (2) implies (1). 

Assume now that (1) holds. Then (det A)” = det A” = 1 and since detA = 
a? + b? > 0, we must have det A = 1, that is a? + b? = 1. Thus we can find t € R 
such that a = cost and b = sint. Then A = A, and by Problem 2.29 we have 
I, = A" = An. This forces cos(nt) = 1 and so ż is a rational multiple of x. The 
problem is solved. Oo 


2.5.1 Problems for Practice 


1. Letn > 1 be an integer. Prove that the equation 


ye 01 
00 
has no solutions in M3 (C). 
2. Solve in M3(C) the binomial equation 


3. Letn > 1 be an integer. Prove that the equation 


eaha 
0 0 


has no solutions in M2 (Q). 
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4. Find all matrices X € M>(R) such that 


ga) 42 
~ | —3 -2 |" 


5. Find all matrices A, B € M>(C) such that 
AB =O, and A+ B= 0). 


6. Solve in M(R) the equation 


»_{ 7-5 
i =| is a 


7. Solve in M>(R) the equation 


2.6 Application to Pell’s Equations 


Let D > 1 be an integer which is not a perfect square. The diophantine equation, 
called Pell’s equation 


x? — Dy? =1 (2.7) 


has an obvious solution (1, 0) in nonnegative integers. A well-known but nontrivial 
result (which we take for granted) is that this equation also has nontrivial solutions 
(i.e., different from (0, 1)). 

In this section we explain how the theory developed so far allows finding all 
solutions of the Pell equation once we know the smallest nontrivial solution. Let Sp 
be the set of all solutions in positive integers to the Eq. (2.7) and let (x1, y1) be the 
fundamental solution, i.e., the solution in Sp for which the first component x, is 
minimal among the first components of the elements of Sp. 

If x, y are positive integers, consider the matrix 


x Dy 
Aay) = È y |; 


so that (x, y) € Sp if and only if det Aœ.) = 1. Elementary computations yield the 
fundamental relation 


Ay) j Awy) = A (cut Dyv,xv-+yu) (2.8) 
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Passing to determinants in (2.8) we obtain the multiplication principle: 


if (x,y), (u,v) € Sp, then (xu + Dyv,xv+ yu) € Sp. 


It follows from the multiplication principle that if we write 


Al b pa n>], 


(x1.91) Z 


then (Xn, Yn) € Sp for all n. The sequences x, and y, are described by the recursive 
system 


Xn+1 = X1 Xn + Dy Yn (2.9) 
Yn+1 = ViXn + Xn, nz 1 


: 1 
consequence of the equality Ata y) = AG.) Aæ, y) Moreover, Theorem 2.25 


gives explicit formulae for x, and y, in terms of xı, y1, n: the characteristic equation 
of matrix A(,,),) is 
V2? -2x,A+1=0 


with Ajo = xy £ af Xz -l= xt yıvD, and Theorem 2.25 yields, after an 
elementary computation of the matrices B, C involved in that theorem 


II 


Xn 


E + yı VD}! + (r= yı V D)"”] 
(2.10) 


1 
Yn = Fa + yı v D)” — (xı — yı vV D)”], n>l. 

Note that relation (2.10) also makes sense for n = 0, in which case it gives the 
trivial solution (xo, yo) = (1, 0). 


Theorem 2.38. All solutions in positive integers of the Pell equation x? — Dy? = 1 
are described by the formula (2.10), where (xı, yı) is the fundamental solution of 
the equation. 


Proof. Suppose that there are elements in Sp which are not covered by formula 
(2.10), and among them choose one (x,y) for which x is minimal. Using the 
multiplication principle, we observe that the matrix Agya y) generates a 
solution in integers (x’, y’), where 


x’ = xx — Dyyy 


4 — 


y = yıx + x1ıy 


We claim that x’, y’ are positive integers. This is clear for x’, as x > V/D y and xı > 
~ Dy, thus xx > Dyıy. Also, x;y > yıx is equivalent to x? (x?—1) > x? (x?—1) 
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or x > xı, which holds because (x1, y1) is a fundamental solution and (x, y) is 
not described by relation (2.10) (while (x1, y1) is described by this relation, with 
n = 1). Moreover, since Aw y) Ay) = Acx,y), We have x = x'xı + Dy’y, > x’ 
and y = x'yı + y’x, > y’. By minimality, (x’, y’) must be of the form (2.10), i.e., 
Aw AG OF Ab on) for some positive integer k. Therefore Aay eee i.e., 
(x, y) is of the form (2.10), a contradiction. 


Problem 2.39. Find all solutions in positive integers to Pell’s equation 


x? — 2y? = 1. 


Solution. The fundamental solution is (x1, y1) = (3, 2) and the associated matrix is 


34 
4oa= [5] 


The solutions (Xn, Yn)n>1 are given by A(3,2)> Le. 


Xn 


IG + 2/2)" + 3-272)" 


1 n o pe n 
Yn = SIG + 2V2) (3 —2V2)"]. 


Oo 


We can extend slightly the study of the Pell equation by considering the more 
general equation 


ax? —by? =1 (2.11) 


where we assume that ab is not a perfect square (it is not difficult to see that if ab is 
a square, then the equation has only trivial solutions). Contrary to the Pell equation, 
this Eq. (2.11) does not always have solutions (the reader can check that the equation 
3x? — y? = 1 has no solutions in integers by working modulo 3). 

Define the Pell resolvent of (2.11) by 


uw —aby =1 (2.12) 


and let S4 p be the set of solutions in positive integers of Eq. (2.11). Thus $j a, is the 
set denoted Sap when considering the Pell equation (it is the set of solutions of the 
Pell resolvent). If x, y, u, v are positive integers consider the matrices 


x by u abv 
By) = È d > Auv = E u | ; 


the second matrix being the matrix associated with the Pell resolvent equation. 
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An elementary computation shows that 


B,y) Auv) = B(xu+by vaxv+yu)> 


Passing to determinants in the above relation and noting that (x, y) € Sg» if and 
only if det By,,y) = 1, we obtain the multiplication principle: 


if (x,y)€ Sap and (u,v) € Sap, then (xu+byv,axv+ yu) € Sap, 


i.e., the product Bx, y)A(u,v) generates the solution (xu + byv,axv + yu) of (2.11). 
Using the previous theorem and the multiplication principle, one easily obtains the 
following result, whose formal proof is left to the reader. 


Theorem 2.40. Assume that Eq.(2.11) is solvable in positive integers, and let 
(xo, yo) be its minimal solution (i.e., Xo is minimal). Let (uy, vı) be the fundamental 
solution of the resolvent Pell equation (2.14). Then all solutions (Xn, yn) in positive 
integers of Eq. (2.11) are generated by 


Bin Yn) = Bixo, yo) A (uy, v)’ n Z 0 (2.13) 
It follows easily from (2.13) that 
Xn = Xon + byoVn (2.14) 


Yn = Youn + AX0Vn, n= 0 


where (un, Vn )n>1 is the general solution to the Pell resolvent equation. 


Problem 2.41. Solve in positive integers the equation 
6x? — 5y = 1. 


Solution. This equation is solvable and its minimal solution is (xo, yo) = (1, 1). 
The Pell resolvent equation is u — 30v* = 1, with fundamental solution (ui, vı) = 
(11,2). Using formulae (2.14) and then (2.10), we deduce that the solutions in 
positive integers are (Xn, Yn)n>1, Where 


= 


Xn = ——— (11 + 2730)" + = aL 


PENA -ava 


—_ 


Yn = =a 2/30)" + ai y 


— 2/30)". 
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2.6.1 Problems for Practice 


1. A triangular number is a number of the form 1 + 2 + ... + n for some positive 
integer n. Find all triangular numbers which are perfect squares. 

2. Find all positive integers n such that n + 1 and 3n + 1 are simultaneously perfect 

squares. 

. Find all integers a, b such that a? + b? = 1 + 4ab. 

4. The difference of two consecutive cubes equals n? for some positive integer n. 
Prove that 2n — 1 is a perfect square. 

5. Find all triangles whose sidelengths are consecutive integers and whose area is 
an integer. 


W 


Chapter 3 
Matrices and Linear Equations 


Abstract This chapter introduces and studies the reduced row-echelon form of 
a matrix, and applies it to the resolution of linear systems of equations and the 
computation of the inverse of a matrix. The approach is algorithmic. 


Keywords Linear systems * Homogeneous Systems * Row-echelon form e 
Gaussian reduction 


The resolution of linear systems of equations is definitely one of the key motivations 
of linear algebra. In this chapter we explain an algorithmic procedure which allows 
the resolution of linear systems of equations, by performing some simple operations 
on matrices. We consider this problem as a motivation for the introduction of 
basic operations on the rows (or columns) of matrices. A much deeper study of 
these objects will be done in later chapters, using a more abstract (and much more 
powerful) setup. We will fix a field F in the following discussion, which the reader 
might take R or C. 


3.1 Linear Systems: The Basic Vocabulary 


A linear equation in the variables x,,..., x, is an equation of the form 


ax, +...+4,x, =b, 


where @1,...,@n,b E€ F are given scalars and n is a given positive integer. The 
unknowns x 1,...,X, are supposed to be elements of F. 
A linear system in the variables x;,...,x, is a family of linear equations, 


usually written as 


Ay Xi+ 412X2 +..4 ainXn = bı 
21X1 + a22X2 +..4 An Xn = by 3 1) 


ami Xı F Am2X2 +.. + AmnXn = bm 
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Here a11, 412,.--,@mn and b,,..., bm are given scalars. There is a much shorter 
notation for the previous system, using matrices and vectors: denoting X the 
(column) vector with coordinates x1, ..., Xn, A the matrix [a;;]1<i<m,1<j<n, and b 
the (column) vector whose coordinates are b1, . . . , bm, the system can be rewritten as 


AX =b. (3.2) 


Finally, we can rewrite the system in terms of vectors: if Cj,...,C, are the 
columns of the matrix A, seen as vectors in F” (written in column form), the system 
is equivalent to 


xyCy + x2C2 +... +2x,C, = b. (3.3) 


Definition 3.1. (a) The linear system (3.1) is called homogeneous if bı = ... = 
bin = 0. 

(b) The homogeneous linear system associated with the system (3.2) is the system 
AX =0. 


Thus a homogeneous system is one of the form AX = 0 for some matrix A. For 
the resolution of linear systems, homogeneous systems play a crucial role, thanks to 
the following proposition, which shows that solving a general linear system reduces 
to finding one solution and then solving a homogeneous linear system. 


Proposition 3.2 (Superposition Principle). Let A E€ Mmn(F) and b € F”. Let 
S C F” be the set of solutions of the homogeneous linear system AX = 0. If the 
system AX = b has a solution Xo, then the set of solutions of this system is Xo + S. 


Proof. By assumption AXo = b. Now the relation AX = b is equivalent to AX = 
AXo, or A(X — Xo) = 0. Thus a vector X is a solution of the system AX = b if and 
only if X — Xo is a solution of the homogeneous system AY = 0, i.e., X — Xo E S. 
This is equivalent to X € Xo + S. E 


Definition 3.3. A linear system is called consistent if it has at least one solution. It 
is called inconsistent if it is not consistent, i.e., it has no solution. 


Let us introduce a final definition for this section: 


Definition 3.4. (a) Two linear systems are equivalent if they have exactly the 
same set of solutions. 

(b) Let A, B be matrices of the same size. If the systems AX = 0 and BX = 0 are 
equivalent, we write A ~ B. 


Remark 3.5. (a) Typical examples of inconsistent linear systems are 


x, =0 
x, =1 


3.1 


(b) 
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or 
xX; —-2x, = 1 
2x2 — xX = (0) 
Note that homogeneous systems are always consistent: any homogeneous 


system has an obvious solution, namely the vector whose coordinates are all 
equal to 0. We will call this the trivial solution. It follows from Proposition 3.2 
that if the system AX = b is consistent, then it has a unique solution if and only 
if the associated homogeneous system AX = 0 has only the trivial solution. 


3.1.1 Problems for Practice 


1. 


For which real numbers a is the system 


xi +2x2: = 1 
3x, + 6x. =a 


consistent? Solve the system in this case. 


. Find all real numbers a and b for which the systems 


xi + 2x. = 3 

—xı +3x2 = 1 
and 

xı tax. =2 

—x;ı +2x2 = b 


are equivalent. 


. Let a, b be real numbers, not both equal to 0. 


(a) Prove that the system 


ax; + bx. =0 
—bx,; +axı =0 


has only the trivial solution. 
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(b) Prove that for all real numbers c, d the system 


ax; +bx.=c 
—bx,; +ax = d 


has a unique solution and find this solution in terms of a, b,c, d. 


. Let A € M>(C) be a matrix and consider the homogeneous system AX = 0. 
Prove that the following statements are equivalent: 


(a) This system has only the trivial solution. 
(b) A is invertible. 


. Let A and B ben xn matrices such that the system ABX = 0 has only the trivial 
solution. Show that the system BX = 0 also has only the trivial solution. 

. Let C and D ben x n matrices such that the system CDX = b is consistent for 
every choice of a vector b in R”. Show that the system CY = b is consistent for 
every choice of a vector b in R”. 

. Let A € M,(F) be an invertible matrix with entries in a field F. Prove that for 
all b € F” the system AX = b is consistent (the converse holds but the proof is 
much harder, see Theorem 3.25). 


3.2 The Reduced Row-Echelon form and Its 


Relevance to Linear Systems 


Consider a matrix A with entries in a field F. If R is a row of A, say R is zero if all 
entries in row R are equal to 0. If R is nonzero, the leading entry of R or the pivot 
of R is the first nonzero entry in that row. We say that A is in reduced row-echelon 
form if A has the following properties: 


(1) All zero rows of A are at the bottom of A (so no nonzero row can lie below a 


zero row). 


(2) The pivot in a nonzero row is strictly to the right of the pivot in the row above. 
(3) In any nonzero row, the pivot equals 1 and it is the only nonzero element in its 


column. 


For instance, the matrix /,, is in reduced row-echelon form, and so is the matrix 


On. The matrix 


(3.4) 
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is in reduced row-echelon form, but the slightly different matrix 

1-21-1 

B= {0011 

0000 

is not in reduced row-echelon form, as the pivot for the second row is not the only 
nonzero entry in its column. 

What is the relevance of this very special form of matrices with respect to the 
original problem, consisting in solving linear systems of equations? We will see 
in the next sections that any matrix can be put (in an algorithmic way) in reduced 
row-echelon form and that this form is unique. Also, we will see that if A,., is the 
reduced row-echelon form of A, then the systems AX = 0 and A,.¢X = 0 are 
equivalent. Moreover, it is very easy to solve the system A,ef X = 0 since A;er is 
in reduced row-echelon form. 


Example 3.6. Let us solve the system AX = 0, where A is the reduced row-echelon 
matrix given in relation (3.4). The system is 


xı — 2x2 — x4 =0 


x3 + x4 =0 
We can simply express x3 = —x4 and xı = 2x2 + x4, thus the general solution of 
the system is 
(2a + b,a, —b, b) 


witha,be F. 


In general, consider a matrix A which is in reduced row-echelon form and let 
us see how to solve the system AX = 0. The only meaningful equations are those 
given by the nonzero rows of A (recall that all zero rows of A are at the bottom). 
Suppose that the ith row of A is nonzero for some i and let the pivot of that row be 
in column j, so that the pivot is a;; = 1. The ith equation of the linear system is 
then of the form 


n 
Xj + >» dikk = 0. 
k=j+1 


We call x; the pivot variable of the row L;. So to each nonzero row we associate a 
unique pivot variable. All the other variables of the system are called free variables. 
One solves the system starting from the bottom, by successively expressing the pivot 
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variables in terms of free variables. This yields the general solution of the system, 
in terms of free variables, which can take any value in F. If yj,..., ys are the free 
variables, then the solutions of the system will be of the form 


biiyi + bi2y2 +... + dis ys 


x= boiy1 + ba2y2 +... + bas Ys 


bni yı + bn2 y2 Sse bas Ys 


for some scalars b;;. This can also be written as 


bii b ls 
b bzs 
X=sy| FJA] 7 
bn bns 
We call 
by, b ls 
y= bo ere P bos 
bni b ns 
the fundamental solutions of the system AX = 0. The motivation for their name is 
easy to understand: Y;,..., Y, are solutions of the system AX = 0 which “generate” 
all other solutions, in the sense that all solutions of the system AX = 0 are obtained 
by all possible linear combinations of Y,,..., Y; (corresponding to all possible 
values that the free variables y;,..., ys can take). 


Example 3.7. Let us consider the matrix in reduced row-echelon form 


1100-10 2 
0010301 
A=]0001 0 0-1 
0000010 
0000000 


and the associated homogeneous linear system AX = 0. This can be written as 


xı + X2 — X5 + 2x7 = 0 
x3 + 3x5 +x7 = 0 
x4—x7=0 
xX =0 
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The pivot variables are x1, x3, X4, X6, as the pivots appear in columns 1, 3, 4,6. So 
the free variables are x2,.x5,x7. Next, we solve the system starting with the last 
equation and going up, at each step expressing the pivot variables in terms of free 
variables. The last equation gives x5 = 0. Next, we obtain x4 = x7, then x3 = 
—3x5 — x7 and xı = —x2 + x5 — 2x7. Thus 


=X + X5 — 2x7 
X2 


| 
— 
O = 
| 
N 


—3x5 —X7 
X= X7 = X2" 
X5 
0 
X7 


O D O O 0O 


The three column vectors appearing in the right-hand side are the fundamental 
solutions of the system AX = 0. All solutions of the system are given by all possible 
linear combinations of the three fundamental solutions. 


The number of fundamental solutions of the system AX = 0 is the total 
number of variables minus the number of pivot variables. We deduce that the system 
AX = 0 has the unique solution X = 0 if and only if there are no free variables, 
or equivalently every variable is a pivot variable. This is the same as saying that 
the number of pivot variables equals the number of columns of A. Combining these 
observations with the superposition principle (Proposition 3.2) we obtain the very 
important: 


Theorem 3.8. (a) A homogeneous linear system having more variables than 
equations has nontrivial solutions. If the field containing the coefficients of the 
equations is infinite (for instance R or C), then the system has infinitely many 
solutions. 

(b) A consistent linear system AX = b having more variables than equations has 
at least 2 solutions and, if the field F is infinite (for instance F = Ror F = C), 
then it has infinitely many solutions. 


We turn now to the fundamental problem of transforming a matrix into a reduced 
row-echelon form matrix. In order to solve this problem we introduce three types of 
simple operations that can be applied to the rows of a matrix. We will see that one 
can use these operations to transform any matrix into a reduced row-echelon form 
matrix. These operations have a very simple motivation from the point of view of 
linear systems: the most natural operations that one would do in order to solve a 
linear system are: 


e multiplying an equation by a nonzero scalar; 
e adding a multiple of an equation to a second (and different) equation; 
e interchanging two equations. 
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Note that these operations are reversible: for example, the inverse operation of 
multiplication of an equation by a nonzero scalar a is multiplication of that equation 
by the inverse of a. It is therefore clear that by performing any finite sequence 
of such operations on a linear system we obtain a new linear system which has 
precisely the same set of solutions as the original one (i.e., a new linear system 
which is equivalent to the original one). These operations on equations of the system 
can be seen as operations on the matrix associated with the system. More precisely: 


Definition 3.9. An elementary operation on the rows of a matrix A (or elemen- 
tary row operation) is an operation of one of the following types: 


(1) row swaps: interchanging two rows of the matrix A. 

(2) row scaling: multiplying a row of A by a nonzero scalar. 

(3) transvection: replacing arow L by L + cL’ for some scalar c and some row L’ 
of A, different from L. 


The previous discussion shows that if A is a matrix and B is obtained from 
A by a sequence of elementary row operations, then A ~ B, where we recall 
(Definition 3.4) that this simply means that the systems AX = 0 and BX = 0 
are equivalent. 

Corresponding to these operations, we define elementary matrices: 


Definition 3.10. A matrix A € M,(F) is called an elementary matrix if it is 
obtained from [,, by performing exactly one elementary row operation. 


Note that elementary matrices have the same number of rows and columns. There 
are three types of elementary matrices: 


(1) Transposition matrices: those obtained from J, by interchanging two of its rows. 

(2) Dilation matrices: those obtained from /,, by multiplying one of its rows by a 
nonzero scalar. 

(3) Transvection matrices: those obtained from J/,, by adding to a row a multiple of 
a second (and different) row. 


A simple, but absolutely crucial observation is the following: 


Proposition 3.11. Let A E€ Mm n(F) be a matrix. Performing an elementary row 
operation on A is equivalent to multiplying A on the left by the elementary matrix 
corresponding to that operation. 


Proof. If E is any m x m matrix and A € Mm n(F), then the ith row of EA is 
en Ly + eiL +... + eimLm, where Lj,..., Lm are the rows of A and e;; are the 
entries of E. The result follows readily from the definitions. O 


We now reach the most important theorem of this chapter: it is one of the 
most important theorems in linear algebra, since using it we will obtain algorithmic 
ways of solving many practical problems, concerning linear systems, invertibility of 
matrices, linear independence of vectors, etc. 
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Theorem 3.12. Any matrix A E€ Mm n(F) can be put into a reduced row-echelon 
form by performing elementary row operations on its rows. 


Proof. The proof is algorithmic. Start with any matrix A and consider its first 
column. If it is zero, then pass directly to the next column. Suppose that the first 
column C; is nonzero and consider the first nonzero entry, say it is a;;. Then 
interchange rows Lı and L; (if i = 1 we skip this operation), so in the new matrix 
we have a nonzero entry x in position (1, 1). Multiply the first row by 1/x to obtain 
a new matrix, in which the entry in position (1, 1) is 1. Using transvections, we can 
make all entries in the first column below the (1,1) entry equal to 0: fori > 2 
subtract b;; times the first row, where b; is the entry in position (i, 1). Thus after 
some elementary row operations we end up with a matrix B whose first column is 
either 0 or has a pivot in position (1, 1) and zeros elsewhere. 

Next, we move to the second column C3 of this new matrix B. If every entry 
below bi? is zero, go directly to the third column of B. Suppose that some entry 
below b12 is nonzero. By possibly swapping the second row and a suitable other row 
(corresponding to the first nonzero entry below b12), we may assume that bo. Æ 0. 
Multiply the second row by 1/b22 so that the entry in position (2,2) becomes 1. 
Now make the other entries in the second column zero by transvections. We now 
have pivots equal to | in the first and second columns. Needless to say, we continue 
this process with each subsequent column and we end up with a matrix in reduced 
row-echelon form. E 


Remark 3.13. The algorithm used in the proof of the previous theorem is called 
Gaussian reduction or row-reduction. 


By combining the Gaussian reduction theorem (Theorem 3.12) and Proposi- 
tion 3.11 we obtain the following result, which will be constantly used in the next 
section: 


Proposition 3.14. For any matrix A E€ Mm. n(F) we can find a matrix B € Mp (F) 
which is a product of elementary matrices, such that Aref = BA. 


Remark 3.15. In order to find the matrix B in practice, the best way is to row-reduce 
the matrix [A|J,,] if A is m x n. The row-reduction will yield the matrix [A,ef |B], 
as the reader can check. 


Example 3.16. Let us perform the Gaussian reduction on the matrix 


01234 
10123 

A= Ma5(R). 
o 1111| © Ms® 


3 1-102 


The first nonzero entry in column C; appears in position (2, 1) and equals —1, so 
we swap the first and second rows, then we multiply the new first row by —1 to get 
a pivot equal to 1 in the first row. We end up with the matrix 
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10—12 -3 
012 3 4 
aaa Ve ee i 
31-1 0 2 


We make zeros elsewhere in the first column, by subtracting three times the first row 
from the last row. The new matrix is 


10—12 -3 
012 3 4 
&=|oð11 11 
012 611 


Since we are done with the first column, we go on to the second one. The entry 
in position (2,2) is already equal to 1, so we don’t need to swap rows or to scale 
them. Thus we make directly zeros elsewhere in the second column, so that the only 
nonzero entry is the 1 in position (2, 2). For this, we subtract the second row from 
the third and the fourth. The new matrix is 


10—12 -3 
012 3 4 
A= | 9912-3 


000 3 7 


We next consider the third column. The first nonzero entry below the entry in 
position (2,3) is —1, so we multiply the third row by —1 and then make the 1 
in position (3,3) the only nonzero entry in that column by transvections. We end 
up with 


1000 0 
010-—1-2 
At=lo012 3 
0003 7 


We repeat the procedure with the fourth column: we multiply the last row by 1/3 
(so that the first nonzero entry below the one in position (3, 4) becomes 1 our pivot) 
and then make the entry in position (4, 4) the only nonzero entry in its column by 
transvections. The final matrix is the reduced row-echelon form of A, namely 


1000 0 
0100 1/3 
As = 
0010-5/3 


0001 7/3 
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Problem 3.17. Solve the homogeneous linear system AX = 0, where A is the 
matrix from the previous example. 


Solution. The systems AX = 0 and A,ef X = 0 being equivalent, it suffices to 
solve the latter system. The pivot variables are x1, x2, x3, X4 and the free variable is 
x5. The system A;er X = 0 is given by 


x; =0 

2+ B=0 
x3— 3x5 =0 
x4 + 4x5 =0 


The resolution is then immediate and gives the solutions 


1 5 7 
(0,—=t,=t,-<=t,t), tER. oO 
3 3 3 


3.2.1 Problems for Practice 


1. Find the reduced row-echelon form of the matrix with real entries 


12345 
A=|]23456 
34567 


2. Implement the Gaussian reduction algorithm on the matrix 


02112 
11021 
—3 1102 
1 1111 


3. Determine the fundamental solutions of the homogeneous linear system of 
equations AX = 0, where A is the matrix 


1 —210 
A= |-2 4 02 
-12 12 
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4. (a) Write the solutions of the homogeneous system of equations AX = 0 in 
parametric vector form, where 


103 
A=|01-1 
—1 1 —4 


(b) Find a solution to the system for which the sum of the first two coordi- 
natesis 1. 
5. Solve the homogeneous system 


x+2y—3z=0 
2x + Sy +2z=0 
3x —y—4z=0 


6. Show that the homogeneous system of equations AX = 0 has nontrivial 
solutions, where 


2 
1 
3 
1 


Then determine a matrix B of size 4 x 3 obtained from A by erasing one of its 
columns such that the system BY = 0 has only the trivial solution. 
7. Letn > 2 be an integer. Solve in real numbers the linear system 


Xi + X3 X2 + X4 Xn—2 + Xn 
MS a AEE e.’ Inte a 


3.3 Solving the System AX = b 


Consider a linear system AX = b with A € Mm n(F) and b € F”, in the variables 
X1,...,Xy,, Which are the coordinates of the vector X e F”. In order to solve 
this system, we consider the augmented matrix (A|b) obtained by adding to the 
matrix A a new column (at the right), given by the coordinates of the vector b. 
Elementary row operations on the equations of the system come down to elementary 
row operations on the augmented matrix, thus in order to solve the system we can 
first transform (A|b) into its reduced row-echelon form by the Gaussian reduction 
algorithm, then solve the new (much easier) linear system. The key point is the 
following easy but important observation: 
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Proposition 3.18. Consider the linear system AX = b. Suppose that the matrix 
(A’|b’) is obtained from the augmented matrix (A|b) by a sequence of elementary 
row operations. Then the systems AX = b and A'X = b' are equivalent, i.e., they 
have exactly the same set of solutions. 


Proof. As we have already noticed, performing elementary row operations on (A|b) 
comes down to performing elementary operations on the equations of the system 
AX = b, and these do not change the set of solutions, as they are reversible. Oo 


We now reach the second fundamental theorem of this chapter, the existence and 
uniqueness theorem. 


Theorem 3.19. Assume that (A|b) has been brought to a reduced row-echelon form 
(A’|b’) by elementary row operations. 


(a) The system AX = b is consistent if and only if (A’|b’') does not have a pivot in 
the last column. 

(b) If the system is consistent, then it has a unique solution if and only if A’ has 
pivots in every column. 


Proof. (a) Assume that (A’|b’) has a pivot in the last column. If the pivot appears 
in row i, then the ith row of (A’|b’) is of the form (0,...,0, 1). Thus among the 
equations of the system A’X’ = b’ we have the equation 0x} +... + 0x} = 1, 
which has no solution. Thus the system A’X’ = b’ has no solution and so the 
system AX = b is not consistent. 

Conversely, suppose that (A’|b’) does not have a pivot in the last column. 
Say A’ has pivots in columns jı <... < jk < n and call x;,,...,x,, the pivot 
variables, and all other variables the free variables. Give the value 0 to all free 
variables, getting in this way a system in the variables x;,,..., x ;,. This system 
is triangular and can be solved successively from the bottom, by first finding 
X jp» then x;,_,,..., then xj. In particular, the system has a solution and so the 
system AX = b is consistent. 

(b) Since we can give any value to the free variables, the argument in the second 
paragraph of the proof of (a) shows that the solution is unique if and only if 
there are no free variables, or equivalently if and only if A’ has a pivot in every 
column. o 


For simplicity, assume that F = R, i.e., the coefficients of the equations of the 
linear system AX = b are real numbers. In order to find the number of solutions 
of the system, we proceed as follows. First, we consider the augmented matrix 
[A|b] and perform the Gaussian reduction on it to reach a matrix [A’|b’]. If this 
new matrix has a row of the form (0,0,...,0,|c) for some nonzero real number c, 
then the system is inconsistent. If this is not the case, then we check whether every 
column of A’ has a pivot. If this is the case, then the system has a unique solution. 
If not, then the system has infinitely many solutions. 
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Problem 3.20. Let us consider the matrix 


122 
A=]011 
244 


Given a vector b € R?, find a necessary and sufficient condition in terms of the 
coordinates of b such that the system AX = b is consistent. 


Solution. The augmented matrix of the system is 


122b 
[Alb] =} 011d) 
244b; 


In order to obtain its row-reduction, we subtract twice the first row from the third 
one, and in the new matrix we subtract twice the second row from the first one. We 
end up with 


1005; — 2b 
[Alb] ~ |011 b 
0 0 0 b; — 2b, 


By the previous theorem, the system AX = b is consistent if and only if this last 
matrix has no pivot in the last column, which is equivalent to b3 = 2b,. Oo 


Using the fact that for two matrices A, B differing by a sequence of elementary 
row operations the systems AX = 0 and BX = 0 are equivalent, we can give a 
proof of the uniqueness of the reduced row-echelon form of a matrix. The following 
simple and elegant proof of this nontrivial theorem is due to Thomas Yuster.! 


Theorem 3.21. The reduced row-echelon form of a matrix is unique. 


Proof. The proof goes by induction on the number n of columns of the matrix 
A € My »(F). The result being clear for n = 1, assume that n > 1 and that 
the result holds for n — 1. Let A € My,,(F) and let A’ be the matrix obtained 
from A by deleting the nth column. Suppose that B and C are two distinct reduced 
row-echelon forms of A. Since any sequence of elementary row operations bringing 
A to a reduced row-echelon form also bring A’ to a reduced row-echelon form, 
by applying the induction hypothesis we know that B and C differ in the nth 
column only. 

Let j be such that bjn A Cjn (such j exists by the previous paragraph and the 
assumption that B 4 C). If X is a vector such that BX = 0, then CX = 0 (as 


'See the article “The reduced row-echelon form of a matrix is unique: a simple proof”, Math. 
Magazine, Vol. 57, No 2, Mar 1984, pp. 93-94. 
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the systems BX = 0 and CX = 0 are equivalent to the system AX = 0), so that 
(B—C)X = 0. Since B and C differ in the nth column only, the j th equation of the 
system (B — C)X = 0 reads (bjn — Cjn)Xn = 0 and so x, = 0 whenever BX = 0 
or CX = 0. It follows that x, is not a free variable for B and C and thus B and C 
must have a pivot in the nth column. Again, since B and C only differ in the last 
column and since they are in reduced row-echelon form, the row in which the pivot 
in the last column appears is the same for B and C. Since all other entries in the 
last column of B and C are equal to 0 (as B and C are in reduced echelon form), 
we conclude that B and C have the same nth column, contradicting the fact that 
bin £ Cjn. Thus B = C and the inductive step is completed, proving the desired 
result. 

Oo 


3.3.1 Problems for Practice 


1. Write down the solution set of the linear system 


xX; —3x2 —2x3 = —5 


ll 
P 


X2 —X3 
—2x\ +3x2 +7x3 =-—2 


in parametric vector form. 

2. Let A be a matrix of size m x n and let b and c be two vectors in R” such that the 
system AX = b has a unique solution and the system AX = c has no solution. 
Explain why m > n must hold. 

3. Find a necessary and sufficient condition on the coordinates of the vector b € R4 
for the system AX = b to be consistent, where 


3 -62-1 
A= —2 4 13 
0 011 
1 —21 0 


4. Find x, y,z and w so that 


x3||1—1| _100 
y4jlz w 00 
Find one solution with x positive and one with x negative. 


5. Explain why a linear system of 10 equations in 11 variables cannot have a unique 
solution. 
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6. Find all possible values for h and k such that the system with augmented matrix 


12/h 
2k |12 
has 


(a) a unique solution. 
(b) infinitely many solutions. 
(c) no solution. 


7. For what value of s is the vector vı = (s, —7,—6) a linear combination of the 
vectors v2 = (1,0,—1) and v3 = (1, —7, —4)? 
8. Let a, b be real numbers. Solve in real numbers the system 


x+y=a 
y+z=b 
z+t=a 
t+x=b 


3.4 Computing the Inverse of a Matrix 


Recall that a matrix A € M,,(F) is invertible if there is a matrix B such that 
AB = BA = I,. Such a matrix is then unique and is called the inverse of A and 
denoted A~!. A fundamental observation is that elementary matrices are invertible, 
which follows immediately from the fact that elementary row operations on matrices 
are reversible (this also shows that the inverse of an elementary matrix is still an 
elementary matrix). For instance, if a matrix E is obtained from J, by exchanging 
rows i and j, then E~! is obtained from J, by doing the same operation that is 
E`! = E. Also, if E is obtained by adding A times row j to row i in J,,, then E~! 
is obtained by adding —A times row j to row i in J,,. Due to its importance, let us 
state this as a proposition: 


Proposition 3.22. Elementary matrices are invertible and their inverses are also 
elementary matrices. 


Here is an important consequence of the previous proposition and 
Proposition 3.14. 


Theorem 3.23. For a matrix A E€ M,(F) the following statements are equiva- 
lent: 


(a) A is invertible. 
(b) Aref = l. 
(c) A is a product of elementary matrices. 
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Proof. First, let us note that any product of elementary matrices is invertible, 
since any elementary matrix is invertible and since invertible matrices are stable 
under product. This already proves that (c) implies (a). Assume that (a) holds. By 
Proposition 3.14 and our initial observation, we can find an invertible matrix B 
such that A,;e¢ = BA. Since A is invertible, so is BA and so A;e¢ is invertible. In 
particular, all rows in A;ef are nonzero (it is easy to see that if A,ef has an entire 
row consisting of zeros, then A,ef C is never equal to J,,) and so A;e¢ has n pivots, 
one in each column. Since moreover A;¢/ is in reduced row-echelon form, we must 
have A;e¢ = In. Thus (b) holds. 

Finally, if (b) holds, then by Proposition 3.14 we can find a matrix B which is a 
product of elementary matrices such that BA = I„. By the previous proposition B 
is invertible and B7! is a product of elementary matrices. Since BA = I,, we have 
A = B7! BA = B`! and so A isa product of elementary matrices. Thus (b) implies 
(c) and the theorem is proved. Oo 


The following proposition expresses the solutions of the system AX = b when 
A is an invertible matrix. Of course, in order to make this effective, one should have 
an algorithm allowing one to compute A~!. We will see such an algorithm (based 
again on row-reduction) later on (see the discussion following Corollary 3.26). 


Proposition 3.24. If A € M,,(F) is invertible, then for all b € F” the system 
AX = b has a unique solution, namely X = A~'b. 


Proof. Let X be a solution of the system. Multiplying the equality AX = b on the 
left by A~! yields A~'(AX) = A7'D. Since 


AT! (AX) = (A'A)X = 1,X = X, 


we conclude that X = A7!b, thus the system has at most one solution. To see that 
this is indeed a solution, we compute 


A(A7!b) = (AA7!)b = I,b =b. 


oO 


It turns out that the converse is equally true, but much trickier. In fact, we have 
the fundamental: 


Theorem 3.25. Let A € M,(F) be a matrix. The following statements are 
equivalent: 


(a) A is invertible 
(b) For allb € F” the system AX = b has a unique solution X € F”. 
(c) Forallb € F” the system AX = b is consistent. 


Proof. We have already proved that (a) implies (b). It is clear that (b) implies (c), 
so let us assume that (c) holds. Let A,ef be the reduced row-echelon form of A. By 
Proposition 3.14 we can find a matrix B which is a product of elementary matrices 
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(thus invertible) such that A,ef = BA. We deduce that the system A,ef X = Bb 
has at least one solution for all b € F” (indeed, if AX = b, then Ayer X = BAX = 
Bb). Now, for any b’ € F” we can find b such that b’ = Bb, by taking b = B~'D’. 
We conclude that the system A,ef X = b is consistent for every b € F”. But then 
any row of A,ef must be nonzero (if row i is zero, then choosing any vector b 
with the ith coordinate equal to 1 yields an inconsistent system) and, as in the first 
paragraph of the proof of Theorem 3.23 we obtain Aef = In. Using Theorem 3.23 
we conclude that A is invertible and so (a) holds. The theorem is proved. Oo 


Here is a nice and nontrivial consequence of the previous theorem: 
Corollary 3.26. Let A, B € M,,(F) be matrices. 


(a) If AB = L, then A is invertible and B = A™!. 
(b) If BA = I,, then A is invertible and B = A™!. 


Proof. (a) For any b € F” the vector X = Bb satisfies AX = A(Bb) = 
(AB)b = b, thus the system AX = b is consistent for every b € F”. By 
the previous theorem, A is invertible. Multiplying the equality AB = [, on the 
left by AT! we obtain B = A~'AB = A™!, thus B = A™!. 

(b) By part (a), we know that B is invertible and A = B~!. But then A itself is 
invertible and A~! = B, since by definition B- B~! = B7! . B = I. o 


The previous corollary gives us a practical way of deciding whether a square 
matrix A is invertible and, if this is the case, computing its inverse. Indeed, A 
is invertible if and only if we can find a matrix X such that AX = J), as then 
X = A`!. The equation AX = I, is equivalent to n linear systems: AX, = e}, 
AX) = @2,..., AX, = en, Where e; is the ith column of J, and X; denotes the ith 
column of X. We already know how to solve linear systems, using the reduced row 
echelon form, so this gives us a practical way of computing X (if at least one of 
these systems is inconsistent, then A is not invertible). 

In practice, one can avoid solving n linear systems by the following 
trick: instead of considering n augmented matrices [A]|e;], consider only one 
augmented matrix [A|J/,,], in which we add the matrix 7, to the right of A (thus 
[A|/,,] has 2n columns). Thus we solve simultaneously the n linear systems we 
are interested in by enlarging the augmented matrix! Now find the reduced 
row-echelon form [A’|X] of this n x 2n matrix [A|/,,]. If A’ is different from 7,, 
then A is not invertible. If A’ = /,,, then the inverse of A is simply the matrix X. 


Example 3.27. Consider the matrix 


1222 
2122 

A= M4R). 
2212| €R) 


1222 


We will try to see whether the matrix A is invertible and, if this is the case, compute 
its inverse. Consider the augmented matrix B = [A|J4] and let us find its reduced 


3.4 Computing the Inverse of a Matrix 103 


row-echelon form using Gaussian reduction. Subtracting twice the first row from 
each of the second, third, and fourth row of B, we end up with 


12 2 2 1000 
B= 0 —3 —2 —2 —2 1 0 0 
0 —2 —3 —2 -2 0 1 0 
0 —2 —2 -3 -2 0 0 1 


Multiply the second row by —1/3. In the new matrix, add twice the second row to 
the third and fourth row. We end up with 


122 2 1 000 

2 2 2 1 
Ba=|01 3 3 3 -300 
oer iar on oe oe 
amar ome ae 


Multiply the third row by -2, In the new matrix add 2/3 times the third row to 
the fourth one, then multiply the fourth row by —5/7. Continuing the Gaussian 
reduction in the usual way, we end up (after quite a few steps which are left to the 
reader) with the matrix 


5D 22-2 
eS 
ete eee 
soot ae 
00015 5 577 

This shows that A is invertible and 
~5 2 2 2 

Te Ta TAT 
; 2 05 2 2 
A a ee 

T Fg 

2 2 2 _5 

1 OF. 9 | 


Let us take a closer look at this example, with another proof (this proof works 
in general when the coefficients of the matrix have sufficient symmetry). Let us 
consider solving the system AX = Y. This can be written 


xı + 2x2 + 2x3 + 2x4 = yı 
2x1 + X2 + 2x3 + 2x4 = y2 
2x1 + 2x2 + x3 + 2x4 = y3 
2i t 2x2 F2 aS y4 
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We can easily solve this system by introducing 
S =x, + X2 + X3 + x4. 
Then the equations become 
2S — x; = yp 1<i <4. 
Thus x; = 2S — y;. Taking into account that 
S = xi + x2 + x3 + x4 = 
(2S — yi) + 2S — y2) + (2S — y3) + (2S — ya) = 8S — Yı + y2 + y3 + y4), 


we deduce that 


Vit yt yat y4 
7 


S 


and so 


2 nee ise 
772 773 774 


5 
x= =z t 
and similarly for x2, x3, X4. This shows that for any choice of Y € R4 the system 
AX = Y is consistent. Thus A is invertible and the solution of the system is given 
by X = A7!Y. If the first row of AT! is (a,b,c,d), then 
xı = ayı + bys + cys + dys. 


But since we know that 


> 5 gi ace pe 
Xx = 77! 72 773 74 


and since y1, Y2, ¥3, y4 are arbitrary, we deduce that 


In this way we can find the matrix A`! and, of course, we obtain the same result 
as before (but the reader will have noticed that we obtain this result with much less 
effort!). 
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3.4.1 Problems for Practice 


1. Is the matrix 


12 3 
A=|-1-2-4 
011 


invertible? If so, compute its inverse. 
2. For which real numbers x is the matrix 


1x1 
A=/01x 
101 


invertible? For any such number x, compute the inverse of A. 
3. Let x, y, z be real numbers. Compute the inverse of the matrix 


Ixy 
A=|01z 
001 
4. Determine the inverse of the matrix 
n1i1...1 
lnl...1 
A= # has ty € M, (R). 
111l...n 


5. Leta be a real number. Determine the inverse of the matrix 


1 0 OO ...00 
a 1 0 ...00 


a a 1 ...00 
A=] a” Se EMR). 
a”? q”? a"-* ...10 


a" a" a3... al 


Chapter 4 
Vector Spaces and Subspaces 


Abstract In this chapter we formalize and generalize many of the ideas 
encountered in the previous chapters, by introducing the key notion of vector 
space. The central focus is a good theory of dimension for vector spaces spanned 
by finitely many vectors. This requires a detailed study of spanning and linear 
independent families of vectors in a vector space. 


Keywords Vector space * Vector subspace * Span ° Linearly independent set 
e Dimension ° Direct sum * Basis 


In this chapter we formalize and generalize many of the ideas encountered in the 
previous chapters, by introducing the key notion of vector space. It turns out that 
many familiar spaces of functions are vector spaces, and developing an abstract 
theory of vector spaces has the advantage of being applicable to all these familiar 
spaces simultaneously. A good deal of work is required in order to define a good 
notion of dimension for vector spaces, but once the theory is developed, a whole 
family of nontrivial tools are at our disposal and can be used for a deeper study of 
vector spaces. 

In all this chapter we fix a field F € {Q, R, C, F2}, which the reader might want 
to take R or C, for simplicity. The elements of F are called scalars. 


4.1 Vector Spaces-Definition, Basic Properties and Examples 


We refer the reader to the appendix on algebraic preliminaries for the notion of 
group and commutative group (we will recall below everything we need, anyway). 
Let us simply recall that a commutative group (V, +) is a set V endowed with an 
addition rule + : V x V —> V, denoted (v,w) —> v + w, and satisfying natural 
identities (which are supposed to mimic the properties of addition on integers, 
rational numbers, real numbers, etc.). We are now ready to introduce a fundamental 
definition, that of a vector space over a field F. The prototype example to keep in 
mind is F” (n being any positive integer), which has already been introduced in the 
first chapter. 
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Definition 4.1. A vector space over F or an F-vector space is a commutative 
group (V, +) endowed with a map F x V — V, called scalar multiplication and 
denoted (a, v) > a-v such that for all a,b € F and u,v € V we have 


a) a-(v+w) =a-v+a-wand(a+b)-v=a-v+b)-v. 
b) l-v=v. 
c) (ab)-v=a-(b-yv). 


The elements of V are called vectors. 


Remark 4.2. 1) We usually write av instead of a - v. 
2) By definition, a vector space over F is nonempty! 


Before giving quite a few examples of vector spaces we will make the definition 
more explicit and then try to explain different ways of understanding a vector space. 

Thus a vector space over F is a set V, whose elements are called vectors, in 
which two operations can be performed 


e addition, taking two vectors v, w and returning a vector v + w 
e scalar multiplication, taking a scalar c € F anda vector v € V, and returning the 
vector cv. 


Moreover, the following properties/rules should hold: 


1) addition is commutative: v + w = w + v for all vectors v,w € V. 

2) addition is associative: (u + v) + w = u + (v + w) for all vectors u,v,w € V. 

3) addition has an identity: there is a vector 0 € V such that 0 + v = v + 0 = v for 
alve V. 

4) there are additive inverses: for all v € V there is a vector w € V such that 
v+w=0. 

5) We have lv = v forall v € V. 

6) For all scalars a,b € F and all v € V we have (ab)v = a(bv). 

7) Scalar multiplication is additive: for all scalars a € F and all v,w € V we have 
alv + w) = av + aw. 

8) scalar multiplication distributes over addition: for all scalars a,b € F and all 
v € V we have (a + b)v = av + bv. 


One can hardly come up with a longer definition of a mathematical object, but 
one has to understand that most of the imposed conditions are natural and fairly 
easy to check. Actually, most of the time we will not even bother checking these 
conditions since they will be apparent on the description of the space V and its 
operations. The key point is that we simply want to add vectors and multiply them 
by scalars without having too many difficulties. 


Remark 4.3. An important observation is that the addition + : V — V is an 
internal operation, while the scalar multiplication - : F x V — V is an external 
operation. 


Let us make a few simple, but important remarks concerning the previous rules. 
First of all, one should be careful to distinguish the scalar O € F and the vector 
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0 € V which is the identity for the addition rule. Of course, they are denoted in 
exactly the same way, but they live in quite different worlds, so that there should 
not be any risk of confusion. Next, since addition is associative, we will not bother 
writing ((u + v) + w) + z, but simply u + v+ w +z. 

Now let us focus a little bit on property 4. So let us start with any vector v € V. 
Property 4 ensures the existence of a vector w € V for which v + w = 0. A natural 
question is whether such a vector is unique. The answer is positive (and this holds 
in any group): suppose that w’ is another such vector. Then using properties 2 
(associativity) and 3 we obtain 


w=wt0=w+(4+w)=(wt+v4+w =04w =w. 


Thus w is uniquely determined by v, and we will denote it as —v. 

Another natural question is whether this vector —v coincides with the vector 
(—1)v obtained by multiplying v by the scalar —1. Since mathematical definitions 
are (usually) coherent, one expects that the answer is again positive, which is the 
case. Indeed, on the one hand properties 5 and 8 yield 


(—1)v + v = (—1)v + l= (—1 + lv = Ov 
and on the other hand property 8 gives 
Ov + Ov = (0 + 0)v = Ov. 
Adding —Ov to the previous relation we obtain Ov = 0, thus 
Ov=0, (—1)v = —v 


for all v € V. There are a lot of such formulae which can be obtained by 
simple algebraic manipulations straight from the definitions. Again, we will simplify 
notations and write v — w for v + (—w). 

In the proof that Ov = 0 we used a trick which deserves to be glorified since it is 
very useful: 


Proposition 4.4 (Cancellation law). Let V be a vector space over F. 


a) Ifv + u = w + u for some u,v,w E V, then v = w. 
b) Ifau = av for some v,w € V and some nonzero a € F, then u = v. 


Proof. a) We have 
v = v+0 = v+ (u—u) = (v+u)—u = (w+u)—u = w+ (u—u) = w+0 = w, 


hence v = w, as desired. 
b) Similarly, we have 


u = l-u = (a 'a)u = a`! (au) = a`! (av) = (a'a) = 1v =v. O 
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It is now time to see some concrete vector spaces. We have already encountered 
quite a few in previous chapters. Let us explain why. Let us fix a field F. 

First, the field F itself is a vector space over F. Indeed, addition and multi- 
plication on F satisfy all properties 1-8 by definition of a field! Note here that 
scalar multiplication coincides with the multiplication in F. The zero vector 0 in F 
coincides with the natural unit for addition in F. 

Another very important class of vector spaces over F occurs as follows: let K 
be a field containing F. Then K is a vector space over F, for essentially the same 
reasons as in the previous paragraph. Important examples of this situation are Q C 
R, R C C, Q C C. Thus R is a vector space over Q, C is a vector space over R, C 
is a vector space over Q. 

Next, consider a positive integer n and recall that F” is the set of n-tuples of 

v1 


X2 


elements of F, written in column form, X = . We add two such vectors 


Xn 
component-wise and we re-scale them by scalars in F component-wise 


xX} yı xı + yı Xı CXI 
X2 2 X2 + y2 X2 CX2 
+ y = y and c = 
Xn Yn Xn + Yn Xn CXn 


It is not difficult to check that properties 1—8 are all satisfied: they all follow from 

the corresponding properties of addition and multiplication in F, since all operations 

are defined component-wise. Thus F” is a vector space for these two operations. Its 
0 


; 0 : i 
zero vector 0 is the vector having all coordinates equal to 0. 


0 
Consider next the set V = Mm n(F) of m x n matrices with entries in F, where 
m,n are given positive integers. Recall that addition and scalar multiplication on V 
are defined component-wise by 


[ai;] + [bis] = laij + bij] and c[ais] = [cai;] 


for matrices [a;;], [b;;] € V and scalars c € F. Again, all properties 1 — —8 follow 
from the corresponding properties of the operations in F. The zero vector in V is 
the matrix O,,,, all of whose entries are equal to 0. 

We consider now function spaces. In complete generality, let X be any 
nonempty set and consider the set V = F* of functions f : X —> F. We can 
define addition and scalar multiplication on V by the rules 


(f + s(x) = fx) + g(x) and (cf)(x) = cf (x) 
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Fig. 4.1 The functions f(x) = x?, g(x) = 1 — 2x and their sum (f + g)(x) = x— 2x + 1 


forc € F and x € X. Then V is a vector space over F, again thanks to the fact that 
all operations are induced directly from the corresponding operations in F. The zero 
vector 0 of V is the map 0 : X — F sending every x € X to 0 € F. Note that for 
X = {1,2,...,n} we recover the space F”: giving a map f : {1,2,...,n} > F 
is the same as giving a n-tuple of elements of F (namely the images of 1,2,...,n), 
that is an element of F”. 

One can impose further natural properties on the maps f : X — F and still 
get vector spaces, contained in F*. For instance, consider the set C[0, 1] of real- 
valued continuous functions on the interval [0, 1]. Thus an element of C[0, 1] is a 
continuous map f : [0, 1] > R. The addition and scalar multiplication are inherited 
from those on the vector space R!-! of all real-valued maps on [0, 1]. For example, 
if f(x) = x? and g(x) = 1 — 2x, then (f + g)(x) = x? — 2x + 1, for all x in the 
interval [0, 1] (see Fig. 4.1). 

As another example, the function f given by f(x) = sin 5z~x and its re-scaling 
-3f are depicted in Fig. 4.2. 

The key point is that the sum of two continuous maps is still a continuous map, 
and if f is continuous and c is a real number, then cf is also continuous. This 
ensures that the addition and scalar multiplication laws are well defined on C[0, 1]. 
They satisfy all properties 1—8, since these properties are already satisfied on the 
larger space R®!, Then C[0, 1] is itself a vector space over R, contained in RPH, 
This is an example of vector subspace of a vector space, a crucial notion which will 
be introduced and studied at length in the sequel. 

There is nothing special about the interval [0, 1]: for each interval J we obtain a 
vector space of continuous real-valued maps on 7. If the real numbers are replaced 
with complex numbers, we obtain a corresponding vector space of complex-valued 
continuous maps on /. 
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-1.5 - a Y 7 


Fig. 4.2 The function f(x) = sin 5zx and its re-scaling by a factor of = 


In fact, there are many other function spaces: we could consider the vector space 
of piecewise continuous functions, differentiable functions, bounded functions, 
integrable functions, etc, as long as any two functions in such a space add up to 
another function in the same space and re-scaling of a function in the space is 
another function in the space. The possibilities are endless. 

Let us consider now another very important class of vector spaces, namely spaces 
of polynomials. Consider the set R[X] of polynomials in one variable and having 
real coefficients. This set is a vector space over R. Recall that the addition and re- 
scaling of polynomials are done coefficient-wise, so the fact that R[X] is a vector 
space over R follows directly from the fact that R itself is a field. The zero vector in 
R[X] is the zero polynomial (i.e., the polynomial all of whose coefficients are 0). 

The vector space R[X] contains a whole bunch of other vector spaces over R: for 
each nonnegative integer n consider the set R,[X] of polynomials in RLX] whose 
degree does not exceed n. For example the polynomials 
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3 
1+ Xx’, gk t+ 7X’, 1-x-X? 
are all in R3[X], only the first two are in R2[X] and none of them is in R; [X]. Since 
the sum of two polynomials of degree at most n is a polynomial of degree at most 
n, and since deg(c P) < n for any real number c and any P € R,,[X], we deduce 
that R,,[X] is stable under the addition and scalar multiplication defined on R[X], 
thus it forms itself a vector space over R. To be completely explicit, any polynomial 
in R,[X] can be written in the form 
ay + ayX + agX* +++» +a, X" 
where a;, 0 < i < n, are real numbers, and then 
(ao tayX +--+ +4,X") + (bo + OX +--+ +b, X”) 
= (ao € bo) + (a F bı) X PriF (an an bn) X” 


and for c € F 


c(ao+aiX +: +a X") = (cao) + (ca1)X +--+ (can) X”. 


4.1.1 Problems for Practice 


1. Consider the set V = R? endowed with an addition rule defined by 
yt y= axy y’) 
and with a multiplication rule by elements À of R as follows 
à- (x, y) = (2x, 0). 


Is V endowed with these operations a vector space over R? 
2. Define an operation +. on (0, 00) by 


a +x b = ab 
for a,b € (0, oo), and an external multiplication by real numbers as follows 
axb = b" 


fora € R,b € (0,00). Does (0, 00) endowed with this new addition and scalar 
multiplication become a vector space over R? 
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3. (Complexification of a real vector space) Let V be a vector space over R. Let Vc 
be the product V x V (this is the set of ordered pairs (x, y) of elements of V) 
endowed with the following addition rule 


@ +0, y)=@4+x,y+y’). 


Also, for each complex number z = a + ib, consider the “multiplication by 
z rule” 


z: (x,y) := (ax — by, ay + bx) 


on Vc. Prove that Vc endowed with these operations becomes a C-vector space 
(this space is called the complexification of the vector space V). 


4.2 Subspaces 


We have already seen in the previous subsection a lot of subspaces of concrete vector 
spaces. In this section we formalize the concept of vector subspace of a given vector 
space and then study some of its basic properties. 


Definition 4.5. Let V be an F-vector space. A subspace of V is a nonempty subset 
W of V which is stable under the operations of addition and scalar multiplication: 
v+w€W andcve W forallv,we W andc € F. 


Example 4.6. Let V be the vector space over R of all maps f : R — R. Then the 
following sets V;, V2, V3, V4 are subspaces over R. 


D V,={f eV |f isa continuous function on R}. 
ID V={f €V |f isa differentiable function on R}. 
Il) V3={f €V | f is an integrable function on the interval [a, b], 
where a,b € R}. 
IV) Vs={f €V | there exists 0 € R such that | f(x)| < 6, Y x € R}. 


The previous definition invites a whole series of easy observations, which are 
however very useful in practice. 


Remark 4.7. 1. First, note that a vector subspace of a vector space must contain 
the zero vector. Indeed, say W is a vector subspace of V. Since W is nonempty, 
there is v € W, but then 0 = Ov € W. Thus if a subset W of a vector space 
V does not contain the zero vector, then this subset W has no chance of being a 
vector subspace of V. 

2. Next, a key observation is that if W is a subspace of V, then W becomes itself 
an F-vector space, by restricting the operations in V to W. Indeed, since 
properties 1—8 in the definition of a vector space are satisfied in V, they are 
automatically satisfied in the subset W of V. This was essentially implicitly used 
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(or briefly explained) in the previous section, where many examples of vector 
spaces were constructed as vector subspaces of some standard vector spaces. 

3. In practice, one can avoid checking two conditions (stability under addition and 
under scalar multiplication) by checking only one: v+ cw € W whenever v, w € 
W and c € F. More generally, a nonempty subset W of V is a vector subspace 
if and only if 


av+bwew 


for all a,b € F and v,w € W. We deduce by induction on n that if W 
is a vector subspace of V and wy,...,Wn E W and cj,...,c, E F, then 
ciwi +... + CnWn E W. 

4. Another very important observation is the stability under arbitrary intersec- 
tions of vector subspaces. More precisely, if (W;);ez is a family of subspaces of 
V, then 


W := Nie W; 


is again a subspace of V. Indeed, W is nonempty because it contains O (as any 
subspace of V contains 0) and clearly W is stable under addition and scalar 
multiplication, since each W; has this property. 


Problem 4.8. Consider the vector space V = R? over R and the subsets V1, V2 
defined by 


V = {y DER |xt+yt+z=h 


V = {(x, y, z ER? |x +2y +z > V2}. 


Which (if either) of these is a subspace of V? 


Solution. Neither V; nor Vz contain (0, 0, 0), thus they are not subspaces of V. O 
Problem 4.9. Let V = R° and 
U = {x,y,z ER? |XP+y427 <1} 


Is U a subspace of V? 


Solution. U is not a subspace of V, since the vector u = (1,0, 0) belongs to U, but 
the vector 2u = (2, 0, 0) does not belong to U. E 


Problem 4.10. Determine if W is a subspace of V where 


(a) V = C[0, 1] and W consists in those functions f in V for which f (0) = 0. 
(b) V = C[0, 1] and W consists in those functions f in V for which f(1) = 1. 
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(c) V = C[0, 1] and W consists in those functions f in V for which 


f fear=o 
0 


(d) V = C(0, 1] and W consists in those functions f in V for which 


f rdis 1. 
0 


(e) V is the space of three times differentiable functions on [0, 1] and W consists in 
those functions in V whose third derivative is 0. 


Solution. a) If f(0) = 0 and g(0) = 0, then (f + cg)(0) = 0 for all c € R, and 
f + cg is a continuous map. Thus W is a subspace of V. 

b) W does not contain the zero element of V (which is the constant map equal to 
0), thus W is not a subspace of V. 

c) If f, g € W, then for all c € R the map f + cg is continuous and 


1 1 1 
f (f + cg)(x)dx = Í f(x)dx + ef g(x)dx = 0, 


thus f + cg € W. It follows that W is a subspace of V. 
d) W does not contain the zero map in V, thus it is not a subspace of V. 
e) If f, g are three times differentiable and the third derivative is 0, then f + cg has 
the same property for all real numbers c, since (f +cg)® = f® +cg®. Thus 
W is a subspace of V (consisting actually of polynomial functions of degree at 
most 2). Oo 


Problem 4.11. Let U and V be the sets of vectors 
U = { (x1, X2) | %1,X2 20} and V = { (x1, x2) | x1x2 = 0} 


in R°. 


(a) Show that U is closed under addition. 
(b) Show that V is closed under re-scaling. 
(c) Show that neither U nor V is a subspace of R?. 


Solution. It is clear that U is stable under addition, since nonnegative real numbers 
are closed under addition. To see that V is closed under re-scaling, consider a scalar 
c and v = (x1, x2) in V. Then cv = (cx1,cx2) and (cx;)(cx2) = C2x1x. > 0 
because c? > 0 and x1x2 > 0. 

U is not a subspace of R? as v = (1,1) € U but —v = (-l)v ¢ U.V 
is not a subspace of R? since v) = (2,2) € V, v = (—1,—3) € V, but 
vytw=(,-lE¢éV. Oo 


4.2 Subspaces 117 


The union of two subspaces W1, Wz of V is almost never a subspace of V, as 
the following problem shows. 


Problem 4.12. Let V be a vector space over a field F and let V1, V2 be subspaces 
of V. Prove that the union of V1, V2 is a subspace of V if and only if 


Vio, or hEN. 


Solution. If Vi C V2 (resp. V2 C Vi), then Vi U V = V, (resp. Vi U V = V1). 
Therefore in both cases Vi U V2 is a subspace of V. 

Conversely, suppose that V; U Vz is a subspace of V. If Vi C V2, then we are 
done, so suppose that this is not the case. Thus we can find v € V; which does not 
belong to V2. We will prove that Vz C Vj. 

Take any vector x € V2. Since V; U V2 is a subspace of V containing x and v, 
it contains their sum x + v. Thusx+veV, orx +ve€ Wy. If x +v € Vs, then 
v = (x + v)—x € Vy, since V is a subspace of V. This contradicts the choice 
of v, thus we must have x + v € Vi. Since v € Vj, we also have —v € V, and so 
x = (x + v)—v€ Vj. Thus any element of V2 belongs to V; and we have V2 C Vj, 
as desired. Oo 


We now define a very important operation on subspaces of an F’-vector space: 


Definition 4.13. Let W1, W2, ..., W, be subspaces of a vector space V. Their sum 
Wi + W2 + ... + W, is the subset of V consisting of all vectors w; +w2+...+Ww, 
with w; E€ Wi,...,W, E W. 


One could extend the previous definition to an arbitrary family (W;)je7 of 
subspaces of V. In this case Mey W; consists of all sums ety w; with w; € W; 
for alli € J and all but finitely many of the vectors w; are zero, so that the sum 
J jez Wi has only finitely many nonzero terms and thus makes sense, even if I if 
infinite. In practice we will however deal with finite collections of subspaces. The 
following result also holds for infinite families of vector subspaces, but in the sequel 
we prefer to focus on finite families, for simplicity. 


Proposition 4.14. If Wi, W2,..., W, are subspaces of a vector space V, then Wi + 
Wz + ... + W, is a subspace of V. 


Proof. Let us denote for simplicity S = Wı + Wz + ... + Wn. Let s,s’ € S and 
let c be a scalar. It remains to prove that s + cs’ € S. By definition, we can find 
Wi, ..-, Wn and w}, ..., w% such that w;, w; € W; for 1 <i <n and 


s =w +w ++... +W, S =w +w +... +w. 
Then 
s+ ces = wi +w +... +w +elw w +... +w)= 


wi +w +... +W, tow, tow, +... + cw, = (wi tow) +... + (w, + cw). 
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Since W; is a subspace of V and since w;, w; € W;, it follows that w; + cw, € W; 
for all 1 < i < n. The previous displayed formula expresses therefore s + cs’ as a 
sum of vectors in W,,..., W, and shows that s + cs’ € S. This finishes the proof 
that S is a subspace of V. E 


Problem 4.15. Prove that W, + W2 + ... + W, is the smallest subspace of V 
containing all subspaces W1, ..., Wy. 


Solution. Itis clear that W, +...+ W, contains W1, W2, . . . , Wa, since each vector 
w; of W; can be written as 0+0+...+0+w; +0+...+0and0 € W,N...9W,. We 
need to prove that if W is any subspace of V which contains each of the subspaces 
W,,..., Wn, then W contains Wi + W2 +. . .+ Wp. Take any vector v of W1 +. . .+ W. 
By definition, we can write v = wı + w2 +... + Wn for some vectors w; € W;. 
Since W contains W, ..., W,,, it contains each of the vectors w1, ..., w,. And since 
W is a subspace of V, it must contain their sum, which is v. We proved that any 
element of W, + ... + W, belongs to W, thus Wi + ... + W, C W and the result 
follows. E 


We now introduce a second crucial notion, that of direct sum of subspaces: 


Definition 4.16. Let W,, W2,..., W, be subspaces of a vector space V. We say that 
Wi, W2,..., W, are in direct sum position if the equality 


Wi +w +... tw, =0 


with w; € Wi,...,w, E W, forces w; = w2 = ... = Wp = 0. 


There are quite a few different ways of expressing this condition. Here is one 
of them: 


Proposition 4.17. Subspaces W,,...,W, of a vector space V are in direct sum 
position if and only if every element of Wi + W2+...+ W, can be uniquely written 
asasumw,+...+W, with w; E Wi,...,Wn © Wn. 


Proof. Suppose that W;,..., Wp are in direct sum position and take an element v of 
W,+...+ W,. By definition we can express v = wi +... + wn with w; € W; for 
all 1 <i <n. Suppose that we can also write v = w| +... + w’, with w; € W;. We 
need to prove that w; = w; for all 1 < i < n. Subtracting the two relations yields 


0=v—v= (wi — w1) + (w2 = w5) +... + (Wn — w3). 


Let u; = w; — wi. Since W; is a subspace of V, we have u; € W;. Moreover, 
ui +... + Uun = 0. Since W, ..., W, are in direct sum position, it follows that 
ui =... = Un = 0, and so w; = w, forall 1 <i < n, which is what we needed. 


Conversely, suppose every element of W; + W2 +- - -+ W, can be written uniquely 
as a sum of elements of W1, ..., W,. Then 0 = 0 + 0 + ---0 must be the unique 
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decomposition of 0. Thus whenever w; € Wi, w2 E€ W2,...,Wņn E Wh, Satisfy 
wi + w2 ++: + wn = 0, we have wy = wz = --- = w, = 0. Thus W,,..., W, are 
in direct sum position. Oo 


Finally, we make another key definition: 


Definition 4.18. a) We say that a vector space V is the direct sum of its subspaces 
Wi, Wo,..., Wp and write 


V= WEW... W, 


if W1, W2,..., W, are in direct sum position and V = W, + Wz +... + Wa. 
b) If Vi, V2 are subspaces of a vector space V, we say that Vz is a complement (or 
complementary subspace) of V; if Vi @ V2 = V. 


By the previous results, V = W, ®...@ W, if and only if every vector v € V can 
be uniquely written as a sum w; + w2 + . .. + wn, with w; € W; for alli. Hence, if 
Vi, Vz are subspaces of V, then Vz is a complement of V, if and only if every vector 
v € V can be uniquely expressed as v = vj + v2 with vı € V; and v2 € V2. 

The result of the following problem is extremely useful in practice. 


Problem 4.19. Prove that Vz is a complement of V; if and only if V; + V2 = V and 
nan = 10}. 


Solution. Assume that V2 is a complement of V1, thus V = V; @ V2 and each v € V 
can be uniquely written as the sum of an element of V; and an element of V2. This 
clearly implies that Vi + V2 = V.Ifv € VNO bn, then we can write v = v+0 = 0+v 
and by uniqueness v = 0, thus Vi N V2 = {0}. 

Conversely, assume that Vj N V2 = {0} and Vi + V2 = V. The second relation 
implies that each vector of V is the sum of a vector in V; and one in V2. Assume that 
v € V can be written both vı + v2 and v} + v} with vı, v} € Vi and v2, v} € V2. Then 
vı — v} = v — v2. Now the left-hand side belongs to V; while the right-hand side 
belongs to V2, thus they both belong to V; N V2 = {0} and so vı = vi and v = v4, 
giving the desired uniqueness result. O 


Example 4.20. 1. The vector space V = R? is the direct sum of its subspaces 
Vi = {(x,0) | x € R} and V2 = {(0, y) | y € R}. Indeed, any (x, y) € R? can 
be uniquely in the form (a, 0) + (0, b), viaa = x and b = y. 

2. Let V = M,(R) be the vector space of n x n matrices with real entries. If Vi 
and Vz are the subspaces of symmetric, respectively skew-symmetric matrices, 
then V = Vi ® V2. Indeed, any matrix A € V can be uniquely written as the 
sum of a symmetric matrix and a skew-matrix matrix: the only way to have A = 
B + C with B symmetric and C skew-symmetric is via B = 5(A + 'A) and 
C = 45(A-'A). 

3. Let V be the vector space of all real-valued maps on R. Let V; (respectively 
V2) be the subspace of V consisting in even (respectively odd) functions. Recall 
that a map f : R — R is even (respectively odd) if f(x) = f(— x) for all 
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x (respectively f(—x) = — f(x) for all x). Then V = V; @ V2. Indeed, for 


any map f, the only way to write f = g + h with g even and h odd is via 
g(x) = LDH and h(x) = EAA, 


Problem 4.21. Let V be the space of continuous real-valued maps on [—1, 1] and let 


1 
w=trevif fear =o) 


and V2 be the subset of V consisting of constant functions. 


a) Prove that V1, V2 are subspaces of V. 
b) Prove that V = V; @ Vp. 


Solution. a) If fi, f2 are in V; and c € R, then cf, + fo is continuous and 


1 1 1 
f Cfi + fr)(t)dt = ef fiOdt +f fr(t)dt = 0, 
—1 i —1 


thus cfi + f2 € Vi and V; is a subspace of V. It is clear that Vz is a subspace 
of V. 

b) By the previous problem, we need to check that Vi; N V2 = {0} and V = Vi + Vp. 
Assume that f € Vi N V2, thus f is constant and ie f(t)dt = 0. Say f(t) = c 
for all t € [—1, 1], then 


1 
0= L f(t)dt = 2c, 


thus c = 0 and f = 0. This shows that Vj N Vz = {0}. 
In order to prove that V = Vi + V2, let f € V and let us try to write f = c + g 
with c a constant and g € V;. We need to ensure that 


f g(t)dt = 0, 
—1 


that is 


1 
[ro —c)dt =0. 


It suffices therefore to take 


1 
c= 5 | feat 


and g = f — c. oO 


4.2 
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4.2.1 Problems for Practice 


Ow 


10. 


. Show that none of the following sets of vectors is a subspace of R°: 


(a) The set U of vectors x = (x1, X2, x3) such that x? + xe + x? = 1. 
(b) The set V of vectors in R? all of whose coordinates are integers. 
(c) The set W of vectors in R? that have at least one coordinate equal to 0. 


. Determine if U is a subspace of M3 (R), where 


(a) U is the set of 2 x 2 matrices such that the sum of the entries in the first 
column is 0. 

(b) U is the set of 2 x 2 matrices such that the product of the entries in the first 
column is 0. 


. Is Ra subspace of the C-vector space C? 
. Let V be the set of all periodic sequences of real numbers. Is V a subspace of 


the space of all sequences of real numbers? 


. Let V be the set of vectors (x,y,z) € R3 such that x(y? + z?) = 0. Is V a 


subspace of R?? 


. Let V be the set of twice differentiable functions f : R —> R such that for all 


x we have 
Sf" (x) + x? f'(x) — 3f (x) = 0. 


Is V a subspace of the space of all maps f : R > R? 


. Let V be the set of differentiable functions f : R — R such that for all x we 


have 


f(x) — fx? = x. 


Is V a subspace of the space of all maps f : R > R? 


. a) Is the set of bounded sequences of real numbers a vector subspace of the 


space of all sequences of real numbers? 
b) Answer the same question if instead of bounded sequences we consider 
monotonic sequences. 


. Let V be the set of all sequences (x;,)n>0 of real numbers such that 


Xn+2 +NXn41 — (n — 1)xn = 0 


for all n > 0. Prove that V is a subspace of the space of all sequences of real 
numbers. 

Let V be the space of all real-valued maps on R and let W be the subset of V 
consisting of maps f such that f (0) + f(1) = 0. 


a) Check that W is a subspace of V. 
b) Find a subspace S of V such that V = W ẹ@ S. 
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11. Let V be the space of continuously differentiable maps f : R — R and let 
W be the subspace of those maps f for which f(0) = f’(0) = 0. Let Z be 
the subspace of V consisting of maps x ax + b, with a,b € R. Prove that 
V=W @Z. 

12. Let V be the space of convergent sequences of real numbers. Let W be the 
subset of V consisting of sequences converging to 0 and let Z be the subset of 
V consisting of constant sequences. Prove or disprove that W, Z are subspaces 
of VandW O@Z=V. 

13. (Quotient space) Let V be a vector space over F and let W C V bea subspace. 
For a vector v € V, let [vy] = {v+w :w e W}. Note that [vı] = [vo] if 
vi — v2 €E W. Define the quotient space V/W to be {[v] : v € V}. Define 
an addition and scalar multiplication on V/W by [u] + [v] = [u + v] and 
a[v] = [av]. Prove that the addition and multiplication above are well defined 
and V/W equipped with these operations is a vector space. 

14. Let F € {R, C} and let V be a nonzero vector space over F. Suppose that V 
is the union of finitely many subspaces of V. Prove that one of these subspaces 
is V. 


4.3 Linear Combinations and Span 


Let V be a vector space over a field F and let v1, v2,...,¥, be vectors in V. By 
definition, V contains all vectors cıvı + ... + CnVn, With C1,...,Cn E F. The 
collection of all these vectors plays a very important role in the sequel and so 
deserves a formal definition: 


Definition 4.22. Let v;,v2,..., Vv, be vectors in a vector space V over F. 
a) A vector v € V is a linear combination of v,,v2,...,v, if there are scalars 
C1,C2,...,Cn E€ F such that 
v = CV + Cov. +... + CnVn (4.1) 
b) The span of v1, ..., v, is the subset of V consisting in all linear combinations of 
Vi, V2, .. , Vn. It is denoted Span(vı, v2,..., Vn). 


Example 4.23. 1) The span Span(v) of a single vector v in R” consists in all re- 
scaled copies of v (we also say all scalar multiples of v). Using the geometric 
interpretation of vectors in R? (or R°), if v 4 0 then Span(v) is represented by 
the line through the origin in the direction of the vector v. 

2) Let e; = (1,0, 0) and e2 = (0, 1,0). Then 


xie; + X22 = (x1, X2, 0). 
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Since xı and xz are arbitrary we see that Span(e1, e2) consists in all vectors in R? 
whose third coordinate is 0. This is the xı x2-plane in R°. In general, if two vectors 
vı and vz in R? are not collinear, then their span is the unique plane through the 
origin that contains them. 


Problem 4.24. Show that the vector (1,1,1) cannot be expressed as a linear 
combination of the vectors 


vı = (1,0,0), v2 = (0,1,0) and v= (1,1,0). 
Solution. An arbitrary linear combination 


XV + X2V2 + X3V3 = (xı + X3,X2 + x3, 0) 


of vı, v2 and v3 has 0 as the third coordinate, and so cannot be equal to (1,1, 1). O 


More generally, let us consider the following practical problem: given a family 
of vectors v1, v2,..., vg in F” and a vector v € F”, decide whether this vector is a 
linear combination of v,,..., vg, that is v € Span(v,..., v). Consider the n x k 
matrix A whose columns are vj,..., vz. Saying that v € Span(vı,...,vg) is the 
same as saying that we can find x1,...,x, E F such that v = xıvı +... + XkVk, 
or equivalently the system AX = v is consistent (and then x),...,x,% are given 
by the coordinates of X). Since we have a practical way of deciding whether this 
system is consistent (via row-reduction of the augmented matrix [A|v]), we see that 
we have an algorithmic solution to the previous problem. Of course, we can solve 
the previous problem via this method, too. 


Problem 4.25. Consider the vectors vı = (1,0,1,2), vz = (3,4,2,1) and v3 = 
(5,8, 3,0). Is the vector v = (1,0, 0,0) in the span of {v1, v2, v3}? What about the 
vector w = (4, 4,3, 3)? 


Solution. In order to solve this problem, we use the method described above. 
Namely, we consider the matrix 


135 
048 
123 
210 


A= 


We want to know if the system AX = v is consistent. The row-reduction of the 
augmented matrix [A|v] is 


10-10 
0120 
00 0 1 
00 0 0 


[Aly] ~ 


124 4 Vector Spaces and Subspaces 


Looking at the third row in the matrix appearing in the right-hand side, we see that 
the system is not consistent, thus v is not in the span of {v1, v2, v3}. 

For the vector w, we use the same method. The row-reduction of the augmented 
matrix [A|w] is now 


10-11 
01 2 1 
0000 
0000 


[Alw] ~ 


which shows that the system is consistent and so w is in the span of {v1, v2, v3}. If we 
want to explicitly find the linear combination of v1, v2, v3 giving w, all we need is to 
solve the system 


10-1 1 
O12 Si eal 
00 0 12 0 
00 0 %3 0 


This yields without any problem x; = x3 + 1 and x2 = 1 — 2x3. Thus we can write 
w = (1 + x3)y) + x2v2 + (1 — 2x3)v3 


and this for any choice of x3. We can take for instance x3 = 0 and obtain w = 
vı + v2. Oo 


The following result is easily proved, but explains the importance of the notion 
of span: 


Proposition 4.26. Let V be a vector space over F and let vi, v2,..., Vn E V. 

Then 

a) Span(v1, v2,...,Vn) is the intersection of all subspaces of V which contain 
Vi, V2,..., Vn. 

b) Span(vı, v2,...,Vn) is the smallest vector subspace of V which contains 
VI, V2,- .., Vn- 


Proof. Since an arbitrary intersection of vector subspaces is a vector subspace, part 
a) implies part b), so we will focus on the proof of part a). 

First, let us prove that Span(v, v2, . . . , Vn) is contained in every vector subspace 
W of V that contains v1, v2,...,Vn. This will imply that Span(vı, v2,..., Vn) is 
contained in the intersection of all such subspaces W. Or, since W is a subspace of 
V and since vj, v2,..., Vn E W, we also have cyvj + Cov2 +... + CnYn E W for all 
scalars c1, C2,...,C, E F. Thus W contains all linear combinations of vj, v2,..., Vn, 
i.e., it contains Span(v1, v2,..., Vn). 
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It remains to see that Span (vı, v2, . . . , Vn) is a vector subspace of V (as it contains 
V1, V2,..., Vn, this will imply that it contains the intersection of vector subspaces 
containing vj, Vo,...,V¥,). So let x,y © Span(vı, v2,..., Vn) and c E F a scalar. 
Since x, y are linear combinations of v1, v2,..., Vn, We can write x = avı +d2v2+ 
... + anYn and y = div, + bzv2 +... + bnvn for some scalars aj,...,d, and 
b,,...,b,. Then 


x+cy = (a, + chi), + (a2 + cb2)v2 + ... + (an + Chu) Vn 


is also a linear combination of v1, v2,...,V,, thus it belongs to Span(v, v2,..., Vn). 
The result follows. Oo 


Remark 4.27. It follows from the previous proposition and Problem 4.15 that 


Span(vı, v2, ..., Vn) = > Fvi, 


i=l 


where Fv; is the subspace of V consisting in all multiples cv; of v; (equivalently, 
Fy; = Span(v;)). 


We can extend slightly the previous definition and results by considering arbitrary 
subsets of V: 


Definition 4.28. Let S be a subset of V. 


a) Span(S) is the subset of V consisting in all linear combinations cıvı + c2v2 + 
... + CnVn, Where v1, V2,...,V,) is a finite subset of S and cj,c2,...,C, are 
scalars. 

b) We say that S is a spanning set or generating set for V if Span(S) = V. 


Example 4.29. 1) Consider the space V = F” and the canonical basis 


1 0 0 
0 1 0 
e=| o0 e2=] 0 ],.--, ee =] 0 
0 0 1 
x) 
X2 
Then e;,...,é, is a spanning set for F”, since any vector X = | x3 | can be 
Xn 


written X = xie + X2€2 +... + Xnen. 
2) Similarly, consider the space V = Mm. n(F) of m xn matrices with entries in F. 
If Ej; is the matrix in V having the (i, j )-entry equal to 1 and all other entries 0, 
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then the family (Ej; )1<i<m,1<j<n is a spanning family for V, since any matrix 
A = [a;;] can be written 


m n 


A= YS > ay Ey. 


i=l j=l 


3) In the space R,,[X] of polynomials with real coefficients and degree bounded by 
n, the family 1, X,..., X” is spanning. 


Similarly, one can prove (or deduce from the previous proposition) that for an 
arbitrary subset S of V, the set Span(S) is the smallest vector subspace of V which 
contains S. Note the very useful 


if S$; CS) then Span(S)) C Span(S). (4.2) 


Indeed, Span(S2) is a vector subspace containing S2, thus also S1, hence it contains 
Span(S,). Alternatively, this follows from the fact that any linear combination of 
finitely many elements of Sj is also a linear combination of finitely many elements 
of S2. It follows from relation (4.2) that any subset of V containing a spanning 
set for V is a spanning set for V. 

Row-reduction is also very useful in understanding Span(,...,v,), when 
Vj,-.., Vx E€ F”. Indeed, consider the kxn matrix A whose rows are the coordinates 
of the vectors ¥1,...,vg in the canonical basis of F”. Performing elementary 
operations on the rows of A does not affect the span of the set of its rows, hence 
Span(vı, ..., v) is precisely the span of the rows of A,ef, where we recall that 
Aref is the reduced row-echelon form of A (of course, it suffices to consider only 
the nonzero rows of A,ef). This gives in practice a quite manageable form of 
Span(v1,..., vg). 


Example 4.30. Consider the vectors vy} = (1,2,3,4), vz = (3,1,2,1) and 
v3 = (1,2,1,2) in Rt. We would like to obtain a simple description of V = 
Span(vı, V2, v3). 

Consider the matrix 


1234 
A=]3121 
1212 


whose first row is given by the coordinates 1,2,3,4 of vı with respect to the 
canonical basis of R*, and similarly for the second and third row (replacing vı with 
v2 and v3 respectively). Row-reduction yields 


100-2 

= 4 
Aref =|010 2 
001 1 
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Thus 
3 4 
V = Span((1, 0, 0,—5), (0, 1, 0, z) (0, 0, 1, 1)) 
and this is the same as the set of vectors 
3 4 
= (a,b,c, —-a + -b+c 
w=( 5 5 ) 


witha,b,c ER. 


4.3.1 Problems for Practice 


1. Find at least three different ways to express the matrix 


as a linear combination of the matrices 


Aj = l I r AÁ = hal and A3 = LRO : 
—1 —1 -1 1 —1 0 


2. Show that the vector (1, 1, 1) cannot be expressed as a linear combination of 
a,=(1,-1,0), a =(1,0,-1) and a= (0,1,-1). 


3. Let W be the subset of R” consisting of those vectors whose sum of coordinates 
equals 0. Let Z be the span of (1,1,...,1) in R”. Prove or disprove that 
WOeZ=R'". 

4. Let P be the span of (1,1, 1) and (1,1,—1) in R?, and let D be the span of 
(0, 1, —1). Is it true that P @ D = R?? 

5. One of the vectors bı = (3, —7, —6) and b2 = (0,2, 4) is in the plane spanned 
by the vectors vı = (1,0,—1) and v2 = (1, —7, —4). Determine which one and 
write it as linear combination of the vectors vı and v2. Also, prove that the other 
vector is not in the plane spanned by vı and v2. 

6. Let V be the vector space of real-valued maps on R and let fy, (respectively gn) 
be the map sending x to cosnx (respectively cos” (x)). Prove or disprove that 


Span({ fala > OF) = Span({gn|n = 0}). 
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Consider some vectors v;,V2,..., Vn in a vector space V over F, and a vector v in 
Span(v1, v2, . . . , Vn). By definition, there are scalars c1, c2, ...,Cn such that 


v = C1V1 + Cav2 +... + CnVn. 
There is nothing in the definition of the span that requires c1, c2, . . . , Cn in relation 
(4.2) to be unique. 


Problem 4.31. Let vı, v2, v3 be three vectors in R” such that 3v; + v2 + v3 = 0 
and let v = vı + v2 — 2v3. Find infinitely many different ways to write v as a linear 
combination of v1, v2, v3. 


Solution. Let œ be an arbitrary real number. Re-scaling both sides of the equal- 
ity 3v1 + v2 + v3 = O by @ and adding the corresponding relation to the 
equality v = vı + v2 — 2v3 yields 


v= 3a + 1)vı + (@ + 1)v2 + (2 — 2)v3. 


Thus each value of œ provides a different way to write v as a linear combination of 
V1, V2, V3. E 


Suppose now that a vector v can be written as v = avı + a2v2 + ... + anVn. If 
bı, b2, ... , bn are scalars such that we also have v = bivi + bzv2 +... + byv,, then 
subtracting the two relations we obtain 


0= (a1 — bi) F (az — by)v2 Sra (an = bn)Vn. 


Thus we would be able to conclude that a1,a2,...,an are unique if the equation 
(with z},...,Z, E€ F) 


Zivi + Z2V2 +... + ZnVn =O 


would force zı = ... = Zn = 0. As we said above, this is not always the case: take 
for instance n = 1, vı = 0, then aıvı = 0 for any choice of the scalar aı. On the 
other hand, vectors v1, ...,Vn having the uniqueness property play a fundamental 


role in linear algebra and they also deserve a formal definition: 


Definition 4.32. a) Vectors v1, v2,...,Vn in some vector space V are linearly 
dependent if there is a relation 


civi + Cava +... + CnYn = 0 


for which at least one of the scalars c1, C2, .. ., Cn is nonzero. 
b) Vectors v1, v2,...,Vn in the vector space V are linearly independent if when- 
ever we have scalars 41, 42,...,4n With avı + a2v2 + ++- + anVn = O, then 


a) =a) =- = 4, = 0. 
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Example 4.33. In all situations considered in example 4.29, the corresponding 
generating family is also linearly independent. 


Before going on to more abstract things, let us consider the following very 
concrete problem: given some vectors v,,..., vg in F” (take for simplicity F=R), 
decide whether they are linearly independent. We claim that this problem can be 
solved algorithmically in a fairly simple way. Indeed, we need to know if we can 
find x1,..., X% E€ F, not all equal to 0 and such that 


Xyvp +... + Xkvk = Q. 


Let A be the n x k matrix whose columns are given by the coordinates of v1, .. . , vk 
with respect to the canonical basis of F”. Then the previous relation is equivalent to 
AX = 0, where X is the column vector with coordinates x;,...,x,. Thus vj,..., Vk 


are linearly independent if and only if the homogeneous linear system AX = 0 
has a nontrivial solution. We know that this problem can be solved algorithmically, 
via the row-reduction algorithm: let A,ef be the reduced row-echelon form of 
A. If there is a pivot in every column of A,ef, then v,,...,v, are linearly 
independent, otherwise they are not. Thus the original problem can also be solved 
algorithmically. Also, note that since every homogeneous linear system with more 
variables than equations has a nontrivial solution, we deduce that if we have more 
than n vectors in F”, then they are never linearly independent! Thus sometimes 
we can solve the original problem with absolutely no effort, simply by counting the 
number of vectors we are given! 


Problem 4.34. Consider the vectors vı = (1,2,3,4,5), v2 = (2,3,4,5,1), v3 = 
(1,3,5,7,9), v = (3,5, 7,9, 1) in R. Are these vectors linearly independent? If 
the answer is negative, give a nontrivial linear dependency relation between these 
vectors. 


Solution. We consider the matrix 


1213 
2335 
A=|]3457 
4579 
5191 


Row-reduction yields 


100-2 
010 2 
Aref =| 001 1 
000 0 
000 0 
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Since there is no pivot in the last column, the vectors vj, v2,v3,v4 are linearly 
dependent. 

To find a nontrivial linear dependency relation, we solve the system AX = 0, 
which is equivalent to the system A,.¢ X = 0. This system is further equivalent to 


Xi = 2x4, X2 = —2X4, X3 = —X4. 


Taking x4 = 1 (we can take any nonzero value we like), we obtain the dependency 
relation 


2v1 — 2v — v3 + v4 = 0. O 
Problem 4.35. Show that the 4 vectors 
vı = (2,1,3,1), v2 = (—1,0, 1,2), v3 = (3,2,7,4), v4 = (1,2,0,—1) 


are linearly dependent, and find three of them that are linearly independent. 


Solution. Row reduction yields 


2—13 1 1020 
1022 0110 
3170|~lo0001 
124-1 0000 


Thus the 4 vectors are dependent. Eliminating the vector v3 (the one that does 
not have a pivot in its column) yields the linearly independent set of vectors 
{v1, V2, va}. Oo 


One may argue that the above definition is a little bit restrictive in the sense that 
it only deals with finite families of vectors. If we had an infinite family (v;);ez of 
vectors of V, we would not be able to give a meaning to the infinite sum ee 1 CiVi 
for any choice of the scalars c;. However, if all but finitely many of the scalars c; 
were 0, then the previous sum would be a finite sum and would thus make sense. So 
one can extend the previous definition by saying that the family (v;);e7 is linearly 
dependent if one can find scalars (c;);¢; such that all but finitely many are 0, not all 
of them are 0 and >°,., Civi = 0. Equivalently, and perhaps easier to understand, 
an arbitrary family is linearly dependent if there is a finite subfamily which 
is linearly dependent. A family of vectors is linearly independent if any finite 
subfamily is linearly independent. Thus, a (possibly infinite) set L is linearly 
independent if whenever we have distinct elements /),...,/, E€ L and scalars 
Q1,42,...,@4, With ail + azl2 + +++ + dnl, = 0, then ay = ay = +- =a, = 0. 
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Remark 4.36. We note the following simple but extremely useful facts: 


a) A subfamily of a linearly independent family is linearly independent. Indeed, 
let (v;)ier be a linearly independent family and let J be a subset of J. Assume 
that (v; );ey is linearly dependent, thus (by definition) we can find a finite linearly 
dependent subfamily v;,,..., v; with i),...,i, € J. Butiy,...,i, € J, thus 
Vi,,--+,Vi, is a finite linearly dependent subfamily of the linearly independent 
family (v;);e,, contradiction. 

If two vectors in a family of vectors are equal, then this family is automat- 
ically linearly dependent. Indeed, say vector v appears at least twice in the 
linearly independent family (v;);ez. Then by part a), the subfamily v, v should 
be linearly independent. But this is absurd, since an obvious nontrivial linear- 
dependency relation is 1- v + (—1)v = 0. 


b 


ma 


Problem 4.37. Let V be the vector space of all real-valued maps on R. Prove that 
the maps x > |x — 1|, x > |x —2|,..., x — |x — 10| are linearly independent. 


Solution. Let f; (x) = |x —i| for 1 < i < 10 and suppose that 


ahitaft...+afio = 0 


for some real numbers a),...,@)9. Suppose that some a; is nonzero. Dividing by 
ai, we obtain that f; is a linear combination of f{,..., fi-1. fiti.---» fio. But 
fis- fi-1; fizis--+» fio are all differentiable at i, hence f; is also differentiable 
at i. This is obviously wrong, hence a; = O for all 1 < i < 10, and the result 
follows. E 


One can relate the notions of span and that of being linearly dependent, as the 
following proposition shows. It essentially says that a set v1, v2, . . . , Vn is linearly 
dependent if and only if one of the vectors v,,...,v, is a linear combination 
of the other vectors. Note that we used the word set and not family, that is in the 
above statement we assume that v1, ...,v„ are pairwise distinct (as we observed at 
the end of the previous paragraph, if two vectors are equal among v1, ..., Vn, then 
the family v,,..., vn is automatically linearly dependent). 


Proposition 4.38. Let S be a set of vectors in some vector space V. Then S is 
linearly dependent if and only if there is v € S such that v € Span(S \ {v}). 


Proof. We deal separately with each implication. First, suppose that S is linearly 
dependent. By definition, this means that we can find finitely many vectors 
Vj, V2,...,Vn E S and some scalars a1, a2, ..., an, not all 0, such that 


avı + dov2 +... + anVn = 0. 


Note that v,,...,v, are pairwise distinct, since the elements of S are assumed to be 
pairwise distinct. 
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Since not all scalars are 0, there isi € {1,2,...,m} such that a; 4 0. Dividing 
the previous equality by a;, we obtain 


a} ai-1 Gi+1 an 
Vi Pw F vi—1 + v; + Vier + . + —v, = 0, 
qj i i i 
hence 
dı đi—1 đi+1 an 
Vi = ~V~ ++ Vi-1 — Vit1 7 +++7 — Vn 
di di I di 
We deduce that v; belongs to the span of v,...,Vj-1,Vj+1,---,Vn, Which is 
contained in the span of S \ {v;}, as {v1, ... , Vi—1, Vi+1;---, Vn} C S \ {vi}. This 


proves one implication. 
Next, suppose that there is v € S such that v € Span(S \ {v}). That means that 
we can find vj, v2,..., Vn € S \ {v} and scalars a), d2,...,@, such that 


v = 41V + 4V2 +... + AnVn 


But then 

1-v + (—a))y, +... + (—an)vn = 0 
and the vectors v, vı, . . . , Vn are linearly dependent. Since v ¢ {v1,.. . , Vn}, it follows 
that S has a finite subset which is linearly dependent and so S is linearly dependent. 
The result follows. Oo 


The following rather technical and subtle result (the Steinitz exchange lemma) 
is the fundamental theorem in the basic theory of vector spaces. We will deduce 
from it a lot of very nontrivial results, which will help building the theory of finite 
dimensional vector spaces. 


Theorem 4.39 (Exchange lemma). Let L = {vi,v2,..., Vn} and S = 
{W1,W2,-.-.,Wm} be two finite subsets of a vector space V, with L linearly 
independent and S a spanning set. Thenn < m and we can find vectors S1, .. . , Sm—n 
in S such that L U {81, 52,...,Sm—n} is a spanning Set. 


Proof. The result will be proved by induction on n. There is nothing to be proved 
when n = 0, so assume that the result holds for n and let us prove it for n + 1. Since 
V1,V2,-.+,Vn+41 are linearly independent, so are v1, v2,..., Vn by Remark 4.36. Thus 
by the inductive hypothesis we already have n < m and the existence of vectors 
S1,+++,Sm—n such that {v1,... , Vn, S1,- - -, Sm-n} is a spanning set. In particular, we 
can express v,,41 as a linear combination 


Vn+1 = 41V1 +... Flara + bısı rss F BbncnsSmans 


4.4 Linear Independence 133 


If m = n, then the previous relation can be written 

Vn+1 = QV +... F AnVn 
and contradicts the hypothesis that v1, v2,...,Vņ are linearly independent. Thus 
m Æ n and since n < m, we must have n + 1 < m. The same argument also proves 
that at least one of bj, bo,..., bm—n is nonzero. Permuting the vectors 51,..., Sin—n> 
we may assume that b; 4 0. Dividing the relation 


Vn+1 = 41V1 F... F AnVn + bısı Tipas T bin—nSm—n 


by bı and rearranging terms yields 


S a an + 1 7 bmn 
1S SV S = on z n+l... > 8m—n 
by bn by by 
which shows that sı € Span(vı, .. . , Vn, Vn+1, S2, - < >, Sm—n). Thus 
V = Span(vı, eg Vay Siye ss Sman) € Span(vı, sees Vn, Vn+1552,--- Sma) 
and L U {s2,.. . , Sm-n } is a spanning set, which is exactly what we needed. Oo 


Remark 4.40. One can slightly refine the previous theorem by no longer assuming 
that L is finite (but still assuming that S is finite). Indeed, any subset of L is still 
linearly independent. Hence Theorem 4.39 shows that any finite subset of L has size 
at most m and hence L is finite and has sizen < m. 


4.4.1 Problems for Practice 


1. Are the vectors 
vı = (1,2,1), v2 = (—3, 4, 5), v3 = (0, 2, —3) 


linearly independent in R*? 
2. Consider the vectors 


vi = (1,2,1,3), w=(1,-1,1,-1), v3 = (3,0,3,1) 


in R4. 


a) Prove that vı, v2, v3 are not linearly independent. 
b) Express one of these vectors as a linear combination of two other vectors. 
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3. Let V be the vector space of polynomials with real coefficients whose degree 
does not exceed 3. Are the following vectors 


1+3X +X’, X?-3X4+1, 3X°?-X’?-xX-1 
linearly independent in V? 


4. Let V be the space of all real-valued maps on R. 


a) Ifa; <....< an are real numbers, compute 


n 
lim Seen), 
X—> OO * 


i=l 


b) Prove that the family of maps (x > e“*) er is linearly independent in V. 


5. Let V be the space of all maps g : [0, œœ) — R. For each a € (0, 00) consider 
the map fa € V defined by 


x)= : 
Jal ) x+a 
a) Let a; < ... < an be positive real numbers and suppose that a),...,@, are 
real numbers such that 
n 
Ya; fu (x) = 0 


i=l 


for all x > 0. Prove that for all real numbers x we have 


Xa [[@ +a; =0. 


i=l ffi 


By making suitable choices of x, deduce thata; = ... =a, = 0. 
b) Prove that the family ( fa)a>o is linearly independent in V. 


6. Consider V = R, seen as vector space over F = Q. 


a) Prove that 1, af 2: J3 is a linearly independent set in V. Hint: if a,b,c are 
rational numbers such that a + b/2 + c/3 = 0, check that a? + 2ab/2 + 
2b? = See: 

b) Prove that the set of numbers In p, where p runs over the prime numbers, is 
linearly independent in V. 
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7. a) Ifm,n are nonnegative integers, compute 


20 
f cos(mx) cos(nx)dx. 
0 


b) Deduce that the maps x > cos nx, with nonnegative integer, form a linearly 
independent set in the space of all real-valued maps on R. 

8. Let vj, v2,..., Vn be linearly independent vectors in R”. Is it always the case that 
V1,V1 + V2,..., V1 + v2 +... + vn are linearly independent? 


4.5 Dimension Theory 


We are now ready to develop the dimension theory of vector spaces. For general 
vector spaces, this is rather subtle, but we will stick to finite dimensional vector 
spaces, for which the arguments are rather elementary consequences of the subtle 
exchange lemma proved in the last section. We fix a field F and all vector spaces in 
this section will be over F. 


Definition 4.41. A vector space V is called finite dimensional if it has a finite 
spanning set. 


Thus V is finite dimensional if we can find a finite family of vectors 
Vi, V2,...,Vn E V such that all vectors in V are linear combinations of 
Vi, V2,...,Vn. For instance, the spaces F”, Mm n(F) and R,[X] are finite 
dimensional, by example 4.29. However, not all vector spaces are finite dimensional 
(actually most of them are not). 


Problem 4.42. Prove that the vector space V of all polynomials with real coeffi- 
cients is not a finite dimensional R-vector space. 


Proof. Suppose that V has a finite spanning set, so there are polynomials 
P,,...,P, E€ V such that V = Span(P;,..., Pa). Let d be the maximum of 
deg(P1),...,deg(P,,). Since all P; have degree at most d, so does any linear 
combination of P;,..., Pa. It follows that any vector in V has degree at most d, 
which is certainly absurd since X+! has degree greater than d. E 


We would like to define the dimension of a finite dimensional vector space. This 
should be an invariant of the vector space and should correspond to the geometric 
picture (you might prefer to take F = R for a better geometric intuition): a line 
(namely F) should have dimension 1, a plane (i.e., F 2) should have dimension 2, in 
general F” should have dimension n. Before stating and proving the main result, let 
us introduce a crucial definition and practice some problems to get a better feeling 
about it. 


Definition 4.43. A basis of a vector space V is a subset of V which is linearly 
independent and spanning. 


136 4 Vector Spaces and Subspaces 


For instance, the generating families appearing in example 4.29 are all bases of 
the corresponding vector spaces (this explains why we called them canonical bases 
in previous chapters!). 


Problem 4.44. Given the matrix 


find a basis of the subspace U of M,(R) defined by 
U = {X e M)(R) | XA = AX}. 


a, a2 


Solution. Consider a square matrix X = | 
a3 d4 


| Then X € U if and only if 


XA = AX, which can be rewritten as 
2a, 3a2 _ 2a; 2a2 
2a3 3a4 3a3 3a4 |` 
This equality is equivalent to a2 = a3 = 0. Thus 


U={ a |a;,a4 € R}, 
0 d4 


and so a basis of U is given by the matrices X; = l d and X2 = f | (it is 


not difficult to check that X; and X, are linearly independent). Oo 


Problem 4.45. Determine a basis of the subspace U of Rt, where 
U ={(a,b,c,d) € Rf |a+b=0, c = 2d}. 
Solution. Since b = —a and c = 2d, we can write 
U = {(a,—a,2d,d)|a,d € R} = {avı + dvja,d E R}, 
where vı = (1,—1, 0,0) and v2 = (0,0,2, 1). Thus vı, v2 form a generating family 
for U . Moreover, they are linearly independent, since the relation av; + dv2 = 0 is 


equivalent to (a,—a,2d,d) = (0,0,0,0) and forces a = d = 0. We conclude that 
a basis of U is given by vı and v2. E 


Problem 4.46. Consider the subspaces U, V of R* defined by 


U = {(x,y,z,w) €R* | y +z+w=0} 
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and 
V = {(x, y,z,w) € Rî | x = —y, z = 2w}. 
Find a basis for each of the subspaces of U, V and U N V of R4. 
Solution. Expressing w in terms of y and z, we obtain 
U = {(x, y,z, =y —2)|y,z € R} = {xu + yu + zu|x, y,z € R}, 


where u; = (1,0,0,0), uz = (0, 1,0, —1) and u3 = (0,0, 1,—1). Let us see whether 
Uj, U2, U3 are linearly independent. The equality xu; + yuz + zu3 = 0 is equivalent 
to (x, y,z,-y — z) = (0,0,0,0) and forces x = y = z = 0. Thus wy, u2, u3 are 
linearly independent and therefore they form a basis of U. 

Let us deal now with V. Clearly 


V ={(-y, y,2w,w)|y,w € R} = {yvi + wv2|y,w € R}, 
where vı = (—1,1,0,0) and v2 = (0,0,2,1). As above, vı and v2 are linearly 
independent, since the relation yvı + wv2 = O is equivalent to (—y, y, 2w, w) = 
(0,0, 0,0) and forces y = w = 0. Thus vj, v2 form a basis of V. 
Finally, a vector (x, y, z, w) € R* belongs to U N V if and only if 
xX=-y, z=2w, y+z+w=0. 


This is equivalent to x = 3w, z = 2w and y = —3w, or 


(x, y,z,w) = (3w, —3w, 2w, w) = w(3, —3, 2, 1). 


Thus (3, —3, 2, 1) forms a basis of U N V. o 


Problem 4.47. Consider the space V of functions f : R — R spanned by the 
functions in B = {1, x > sin(2x), x > cos(2x)}. 


a) Prove that B forms a basis of V. 
(b) Prove that x +> sin? (x) is a function in V and write it as a linear combination of 
elements of B. 


Solution. a) We need to prove that the vectors in B are linearly independent. In 
other words, we need to prove that if a, b, c are real numbers such that 


a + bsin(2x) + c cos(2x) = 0 


for all real numbers x, then a = b = c = 0. Taking x = 0 we obtain a +c = 0, 
then taking x = 2/2 yields a—c = 0. Thusa = c = Q. Finally, taking x = 1/4 
yields b = 0. 
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b) 
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For all x € R we have 
cos(2x) = 2cos*(x) — 1 = 2(1 — sin?(x)) — 1 = 1 — 2 sin? (x), 


thus 


1 —cos(2x) 


- 2 — 
sin(x) = 5 


We deduce that x ++ sin?(x) is in V and the previous formula expresses it as a 
linear combination 


1 1 
sin? (x) = A 1 + 0- sin(2x) — cos(2x). 


E 


Let us prove now the first fundamental result regarding dimension theory of 


vector spaces. 


Th 


a) 
b) 


eorem 4.48. Let V be a finite dimensional vector space. Then 


V contains a basis with finitely many elements. 
Any two bases of V have the same number of elements (in particular any basis 
has finitely many elements). 


Proof. a) Among all finite spanning sets S of V (we know that there is at least one 


b) 


such set) consider a set B with the smallest possible number of elements. We will 
prove that B is a basis. By our choice, B is a spanning set, so all we need to prove 
is that B is linearly independent. If this is not the case, then Proposition 4.38 
yields the existence of a vector v € B such that v € Span(B \ {v}). It follows 
that B \ {v} is also a spanning set. This contradicts the minimality of B and 
shows that B is indeed linearly independent. 

Let B be a basis with finitely many elements, say n. Let B’ be another basis of 
V. Then B’ is a linearly independent set and B is a spanning set with n elements, 
thus by Remark 4.40 B’ is finite, with at most n elements. This shows that any 
basis has at most n elements. But now we can play the following game: say B’ 
has d elements. We saw that d < n. We exchange B and B’ in the previous 
argument, to get that any basis has at most d elements, thus n < d. It follows 
that n = d and so all bases have the same number of elements. E 


The previous theorem allows us to make the following: 


Definition 4.49. Let V be a finite dimensional vector space. The dimension dim V 


of 


V is the number of elements of any basis of V. 


Example 4.50. a) Consider the vector space F”. Its canonical basis e1, ..., €en is a 


basis of F” with n elements, thus dim F” = n. 
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b) Consider the space F [X], of polynomials with coefficients in F, whose degree 
does not exceed n. A basis of F[X], is given by 1, X,..., X”, thus 


dim F[X], =n +1. 
c) Consider the vector space Mm ,(F) of m x n matrices with entries in F. A basis 
of this vector space is given by the elementary matrices E;; with 1 <i < m and 
1 < j < n (the canonical basis of Mm ,(F)). It follows that 
dim Mm n(F) = mn. 
Problem 4.51. Find a basis as well as the dimension of the subspace 
V = {(a,2a)|a € R} CR’. 
Solution. By definition V is the linear span of the vector (1,2), since (a,2a) = 


a(1,2). Since (1,2) 4 (0,0), we deduce that a basis of V is given by (1,2) and 
dim V = 1. Oo 


The second fundamental theorem concerning dimension theory is the following: 
Theorem 4.52. Let V be a vector space of dimension n < oo. Then 


a) Any linearly independent set in V has at most n elements. 

b) Any spanning set in V has at least n elements. 

c) If S is a subset of V with n elements, then the following assertions are 
equivalent: 


i) S is linearly independent 
ii) S is a spanning set 


iii) S is a basis of V. 
Proof. Fix a basis B of V. By definition, B has n elements. 


a) Since B is a spanning set with n elements, the result follows directly from 
Remark 4.40. 

b) Let S be a spanning set and suppose that S has d < n elements. Since B is 
linearly independent, Theorem 4.39 yields n < d, a contradiction. 

c) Clearly iii) implies i) and ii). It suffices therefore to prove that each of i) and ii) 
implies iii). Suppose that S is linearly independent. By Theorem 4.39 we can 
add n — n = 0 vectors to S so that the new set is a spanning set. Clearly the 
new set is nothing more than S, so S is a spanning set and thus a basis (since by 
assumption S is linearly independent). 

Now suppose that S is a spanning set and that S is not linearly independent. 
By Proposition 4.38 we can find v € S such that v € Span(S \ {v}). Then S \ {v} 
is a spanning set with n — 1 elements, contradicting part b). Thus S is linearly 
independent and a basis of V. E 
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The following problems are all applications of the previous theorem. 


Problem 4.53. Prove that the set U, where 
U = (1,1,1), (1,2, 1), 2, ey 


is a basis of R°. 

Solution. Let v; = (1,1,1), vz = (1,2,1) and v3 = (2,1,1). Since dim R? = 3, 
it suffices to prove that v1, v2, v3 are linearly independent. Suppose that x,y,z € R 
satisfy 


XV, + yv + zv3 = 0. 


This can be written as 


x+y+2z=0 
x+2y+z=0 
x+y+z=0 


Combining the first and the last equation yields z = 0, and similarly, combining the 
second and the last equation yields y = 0. Coming back to the first equation, we 
also find x = 0, and the result follows. Oo 


Problem 4.54. Determine a basis of R? that includes the vector v = (2, 1, 1). 


Solution. Let e1, e2, e3 be the canonical basis of R?. Then v = 2e; + e2 + e3. It 
follows that e3 belongs to the span of v, e1, e2, thus the span of v, e1, e2 is R?. Thus 
v, €1, e2 form a basis of R’, since dim R? = 3 (of course, one can also check directly 
that v, e1, e2 are linearly independent). O 


Problem 4.55. Let R,,[X] be the vector space of polynomials with real coefficients 
whose degree does not exceed n. Prove that if Po, P},...,P, € R,[X] satisfy 
deg Py = k for 0 < k < n, then Po, P,..., Pa form a basis of R,[X]. 


Solution. Since dimR,[X] = n + 1, it suffices to prove that Po, P1, ..., Pa are 
linearly independent. Suppose that ao, a1,...,a@, € R are not all zero and 


aoPo + aı Pı +... +aánP, = 0. 


Let j be the largest index for which a; # 0. Then by hypothesis ao Po + a, P4 + 
... + a; P; has degree exactly j, which contradicts the fact that this polynomial is 
the zero polynomial (since aj+1 = ... = an = 0 and ao Po +... + an Pa = 0) and 
that the zero polynomial has degree —oo. E 


Problem 4.56. Let P € R[X] be a polynomial. Prove that the following assertions 
are equivalent: 


a) P(n) is an integer for all integers n. 


4.5 Dimension Theory 141 


b) There are integers n and do,...,d@, such that 


". X(X-1)...(X -—k+1 
P(X) = Do a = = = 
k=0 ` 


with the convention that the first term in the sum equals do. 


Solution. Let Py = *A7V-C“KF) with Py = 1. It is not difficult to see 
that P.(Z) C Z (as the values of Px at all integers are, up to a sign, binomial 
coefficients). This makes it clear that b) implies a). 

Suppose that a) holds and let d = deg P. Since deg Py = k forO < k < d, 
Problem 4.55 yields real numbers ao, a1, . . . , aq such that P = ap Pp +a,P)+...+ 
aq Pa. We need to prove that ao, . . . ,aq are actually integers. But by hypothesis 


m m m 
P(m) = ao + aı + a +... An—1 + am 
1 2 m—1 


are integers, form = 0,...,d. Using the relation 


m m m 
am = P(m)— (a+ (i)e + (jes (, Jem 


it is easy to prove by induction on j that ao,...,a; are integers for0 < j < d. 
Thus dao,...,@, are all integers and the problem is solved. O 


Before moving on to another fundamental theorem, let us stop and try to explain 
how to solve a few practical problems. First, consider some vectors v1,...,Vk 
in R” and consider the problem of deciding whether this is a basis of R”. By the 
previous results, this is the case if and only if k = n and y,..., vg are linearly 
independent. This is equivalent to saying that k = n and A,.f = In. We see that we 
have an algorithmic solution for our problem. 

Consider now the problem: given vj,..., vx, in R”, decide whether they span 
R”. To solve this problem, we consider the matrix A whose rows are given by the 
coordinates of the vectors v,,..., vx with respect to the canonical basis of R”. We 
row-reduce A and obtain its reduced echelon-form A,¢7. Then vj,..., vg span R” if 
and only if the rows of A,ef span R”. This is the case if and only if Aef has a pivot 
in every column. 

Next, consider the following trickier problem: given some vectors vj,..., Vx in 
R”, find a subset of {v1, . . . , vk} which forms a basis of R”. Of course, if v1,..., vg 
do not span R”, then the problem has no solution (and we can test this using the 
procedure described in the previous paragraph). Assume now that vj,..., vg span 
R”. Let A be the matrix whose columns are given by the coordinates of v1, ... , Vk 
in the canonical basis of R”. We leave it to the reader to convince himself that those 
vectors v; corresponding to columns of A containing a pivot form a basis of R”. 
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Example 4.57. Consider the vectors vı = (1,0,—1,0), vz = (0,1,-1,1), v = 
(2,3,—-12,-1), vs = (1,1,1,1), vs = (1,—1,0,—1). We would like to find a 
subset of these vectors which gives a basis of Rt. Let us check first whether they 
span R4. For that, we consider the matrix 


The row-reduction is 


1000 
0100 
Are = | 0010 
0001 
0000 


and it has pivots in every column, thus v;,..., vs span R4. 
Now, to solve the original problem, we consider the matrix 


10 211 
my, |" 0 AE ade T 
—1 —1 —121 0 
0 1 -11-1 


whose columns are the coordinates of v1, v2, v3, v4, V5. Its row-reduction is 


1000 4 
, _ | 0100-32 
ref 10010 0 
0001-4 


The columns containing pivots are the first four, so v1, v2, v3, v4 form a basis of R4. 

Note that we could have read whether v4, ..., v5 span Rt directly on A’, without 
the need to introduce the matrix A. Indeed, it suffices to check that A’ has a pivot in 
every row, which is the case. 


Problem 4.58. Let S be the set 
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a) Show that S spans the space R? and find a basis for R? contained in S. 
b) What are the coordinates of the vector c = (1,1, 1) with respect to the basis 
found in a)? 


Solution. a) Consider the matrix 
10-5 1 


A=|-21 6 -2 
02-8 1 


Its row-reduction is 


10-50 
Aref = |01-40 
0001 


Since A has a pivot in each row, the columns of A span R°, thus S spans R°. 
Considering the pivot columns of A, we also deduce that a subset of S that forms 


1 0 1 
a basis of R? consists of | —2 |, | 1 | and | —2 |. 
0 2 1 
b) Since 
101 {1 100| 6 
—21-2|/1}~1)]010] 3 |, 
021 {1 001 |-5 


the coordinates of c with respect to this basis given by the last column of the 
matrix above, namely 6, 3, —5. Oo 


Theorem 4.59. Let V be a finite dimensional vector space and let W be a subspace 
of V. Then 


a) W is finite dimensional and dim W < dim V. Moreover, we have equality if and 
only ifW = V. 
b) Any basis of W can be extended to a basis of V. 


Proof. Letn = dimV. 


a) If S is any linearly independent set in W, then S is a linearly independent 
set in V and so S has at most n elements by part a) of Theorem 4.52. Note 
that if we manage to prove that W is finite dimensional, then the previous 
observation automatically implies that dimW < n (as any basis of W is a 
linearly independent set in W). Suppose that W is not finite dimensional. Since 
W is nonzero, we can choose w; € W nonzero. Since {w } is not a spanning 
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set for W, we can choose wọ € W not in the span of w;. Assuming that 
we constructed w;,...,wx, simply choose any vector wz4; in W but not in 
the span of w1,...,w,z. Such a vector exists, since by assumption the finite 
set {w1,..., wx} is not a spanning set for W. By construction, w1, ..., wx are 
linearly independent for all k. Thus w1,...,w,+1 is a linearly independent set 
with more than n elements in W, which is absurd. Thus W is finite dimensional 
and dim W < n. 

We still have to prove that dim W = n implies W = V. Let B be a basis 
of W. Then B has n elements and is a linearly independent set in V. By part c) 
of Theorem 4.52 B is a spanning set for V, and since it is contained in W, we 
deduce that W = V. 

b) Let d = dim W < n and let B be a basis of W. Let B’ be a basis of V. By 
Theorem 4.39 applied to the linearly independent set B in V and to the spanning 
set B’ in V, we can add n—d elements to B to make it a spanning set. This set has 
n elements and is a spanning set, thus it is a basis of V (part c) of Theorem 4.52) 
and contains B. This is exactly what we needed. Oo 


The following result is very handy when estimating the dimension of a sum of 
subspaces of a given vector space. 


Theorem 4.60 (Grassmann’s formula). If W, Wz are subspaces of a finite dimen- 
sional vector space V, then 


dim W; + dim W = dim(W, + W2) + dim(W, N W2). 


Proof. Let m = dimW,,n = dim W, and k = dim(W, N W2). Let B = 
{vi,..., vg} be a basis of W, N W2. Since W, N W; is a subspace of both W; 
and W2, Theorem 4.59 yields bases B,, B2 of W; and W, which contain B. Say 
By, = {vi,..., Vk, Ul, .-., Um-k} and By = {v,..., Vk, W1; ..-,Wn—-k}. We will 
prove that the family 


S = {v1,..., Verli es Umak W1, +++ Wik} 
is a basis of W; + W2, and so 
dim(W, + W) =k+m—-k+n-—k=m+n-—k, 
as desired. 
We start by proving that S is a spanning set for W; + W2. Let x be any vector 
in W, + W2. By definition we can write x = xı + x2 with xı € Wi and x2 € Wp. 


Since Bı and B, are spanning sets for W, and W2, we can write 


X1 = avy +... + AVE + biu; ese Din—kUm—k 
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and 
X2 = cpp t+... + Ckvk + diwi +... + dn—-kWn—k 
for some scalars a; , b; , c1, dp. Then 
x = (ai +ci)vi +. ..+ (ak +Cck)vk+biui +. ..+bm-kUm-k+diwı +.. .+dn-kWn=k 


is in the span of S. Since x was arbitrary in W; + W, it follows that S spans W, + W2. 
Finally, let us prove that S is a linearly independent set in W; + W2. Suppose that 


aivi +... + akvk + biui +... + bm—-kUum-k + CW +... + Cn-kWn-k = 0 
for some scalars a;, bj, cı. Then 
aivi +.. + akvk + biui +... + bm-kUm-k = —(ciıwı +... + Cn—kWn—k)- 
The left-hand side belongs to W; and the right-hand side belongs to W2, hence both 
sides belong to W; N W2, and so they are linear combinations of v1, . . . , vg. Thus we 
can write 
aivi +... + apv + biui +... + bm-kUum-k = divi +... + deve 


for some scalars d1, . . . , dy. Writing the previous relation as 


(a, — di)vı +... + (ap — dy) vg + biui +... + bm-kUm-k = 0 


and using the fact that v1, ... , Vk, U1, ..., Um—k are linearly independent, it follows 
that a; = d\,...,a, = dy and b; =... = bm- = 0. By symmetry we also obtain 
cı =... = Cn-k = 0. Then aiv; +... + agvęg = 0 and since v,,..., vg are linearly 
independent, we conclude that a; = ... = ag = 0. Thus all scalars a;, b; , c; are 0 
and S is a linearly independent set. This finishes the proof of the theorem. Oo 


Remark 4.61. Suppose that W;, W2 are subspaces of a finite dimensional vector 
space V, such that V = W, $ Wy. If Bı and By are bases for W, and Wo, then 
B, U B; is a basis for V. This follows from the proof of the previous theorem, or it 
can simply be checked by unwinding definitions. More generally, if a vector space 
V is the direct sum of subspaces Wi,..., W, and B; is a basis for W; (1 < i < n), 
then Bı U... U B, is a basis for V. We leave this as an easy exercise for the reader. 


Problem 4.62. Let Vi, V2,..., Vp be subspaces of a finite dimensional vector 
space V. Prove that 


dim(V; + V2 +... + Vk) < dim V; + dim V2 +... + dim Vy. 
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Solution. It suffices to prove the result for k = 2, as then an immediate induction 
on k yields the result in general (noting that Vi + V2 +... + Vk = (Vi + Vo) + 
V3 + ... + Vy fork > 3). But for k = 2 this follows from 


dim(V; + V2) = dim V; + dim V2 — dim(Vı N V2) < dimV; + dim V2. O 


Problem 4.63. Let V be a finite dimensional vector space over a field F. Let U, W 
be subspaces of V. Prove that V = U © W if and only if V = U + W and 


dim V = dim U + dim W. 
Solution. If V = U @ W, then clearly V = U + W and we can obtain a basis of 
V by patching a basis of U and one of W, so dim V = dim U + dim W. Suppose 
now that V = U + W and dim V = dimU + dim W. We need to prove that 
U NW = {0}. But 


dim(U N W) = dim U + dim W — dim(U + W) = dim V — dim V = 0, 


thus U NW =Q. El 


4.5.1 Problems for Practice 
1. Do the following two sets of vectors span the same subspace of R?? 
X = { (1,1,0), (3,2,2)} and Y = { (7,3,8), (1,0,2), (8,3,10) } 


2. The set S consists in the following 5 matrices: 
10 11 00 1 0 00 
00)’ 00)’ 11] —1 0? 10]° 
(a) Determine a basis B of M>(R) included in S. 
(b) Write ; 4 as a linear combination of elements of B. 
3. Let e1, e2, €3, €, be the canonical basis of R* and consider the vectors 
vı = € + €4, V2 = 63, V3 = @2, V4 = @2 + ey. 


a) Are the subspaces Span(vı, v2) and Span(v3, v4) in direct sum position? 
b) Are the subspaces Span(v, v2, v3) and Span(v4) in direct sum position? 


4.5 
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. Let V be the set of polynomials f with real coefficients of degree not exceeding 


4 and such that f(1) = f(—1) = 0. 


a) Prove that V is a subspace of the space of all polynomials with real 
coefficients. 
b) Find a basis of V and its dimension. 


. Let V be the set of vectors (x, y, z, t) € R such that x = z and y = t. 


a) Prove that V is a subspace of R4. 
b) Give a basis and the dimension of V. 
c) Complete the basis found in b) to a basis of R4. 


. Consider the set V of vectors (x1, X2, X3, x4) € R4 such that 


xı +x; =0 and x+ x4= Q. 


a) Prove that V is a subspace of R4. 

b) Give a basis and the dimension of V. 

c) Let W be the span of the vectors (1, 1,1,1), (1,—1,1,—1) and (1,0, 1,0). 
Give a basis of W and find V + W and V N W (you are asked to give a basis 
for each of these spaces). 


. A set of three linearly independent vectors can be chosen among 


u=(1,0,-1), v=(2,1,1) w=(4,1,-1, and x= (1,1,1). 


(a) Determine such a set and show that it is indeed linearly independent. 
(b) Determine a nontrivial dependence relation among the four given vectors. 


. Exactly one of the vectors b, = (7,2,5) and b2 = (7,2, —5) can be written as 


a linear combination of the column vectors of the matrix 


103 
A=]114 
011 


Determine which one and express it as a linear combination of the column 
vectors of A. 


. Let V be the set of matrices A € M, (C) for which a;; = 0 whenever i — j is 


odd. 


a) Prove that V is a subspace of M,,(C) and that the product of two matrices 
in V belongs to V. 
b) Find the dimension of V as C-vector space. 
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4 Vector Spaces and Subspaces 
Let V be the set of matrices A € M,,(R) such that 


Qnt+1-in+1—-j = Gij 
fori, j € [l,n]. 


a) Prove that V is a subspace of M,,(R). 
b) Find the dimension of V as R-vector space. 


Find all real numbers x for which the vectors 
vı = (1,0,x), v= (1,1,x), v3 = (x,0,1) 


form a basis of R?. 

Let Py = X*(1 — X)"~*. Prove that Po,..., P, is a basis of the space of 
polynomials with real coefficients, whose degree does not exceed n. 

Let V be a vector space of dimension n over F,. Prove that for all d € [1,7] 
the following assertions hold: 


a) There are (2” — 1)(2” — 2)... (2" — 217!) d-tuples (v1, ..., vg) in V? such 
that the family v,,..., vg is linearly independent. 

b) There are (2” — 1)(2” — 2)... (2” — 2”) invertible matrices in M,,(F2). 

c) There are 


2" — 1)(2"7! — DOO — 1) 
(24 — 1)(24-! — 1)... (2— 1) 


subspaces of dimension d in V. 


Chapter 5 
Linear Transformations 


Abstract While the previous chapter dealt with individual vector spaces, in this 
chapter we focus on the interaction between two vector spaces by studying linear 
maps between them. Using the representation of linear maps in terms of matrices, 
we obtain some rather surprising results concerning matrices, which would be 
difficult to prove otherwise. 


Keywords Linear maps ° Kernel e image * Projection * Symmetry ° Stable 
subspace ¢ Change of basis e Matrix e Rank 


The goal of this chapter is to develop the theory of linear maps between vector 
spaces. In other words, while the previous chapter dealt with basic properties of 
individual vector spaces, in this chapter we are interested in the interactions between 
two vector spaces. We will see that one can understand linear maps between finite 
dimensional vector spaces in terms of matrices and, more importantly and perhaps 
surprisingly at first sight, that we can study properties of matrices using linear maps 
and properties of vector spaces that were established in the previous chapter. 


5.1 Definitions and Objects Canonically Attached 
to a Linear Map 


Unless stated otherwise, all vector spaces will be over a field F, which the reader 
can take R or C. In the previous chapter we defined and studied the basic properties 
of vector spaces. In this chapter we will deal with maps between vector spaces. 
We will not consider all maps, but only those which are compatible with the 
algebraic structures on vector spaces, namely addition and scalar multiplication. 
More precisely: 


Definition 5.1. Let V,W be vector spaces over F. A linear map (or linear 
transformation or homomorphism) between V and W isa map T : V > W 
satisfying the following two properties: 
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1) Ty, + v2) = T (vı) + T (v2) for all vectors vı, v2 € V, and 
2) T(cv) = cT (v) for all v € V and all scalars c € F. 


The reader will notice the difference between this definition and the definition 
of linear maps in other parts of mathematics: very often in elementary algebra or in 
real analysis when we refer to a linear map we mean a map f : R > R of the form 
f(x) = ax + b for some real numbers a, b. Such a map is a linear map from the 
point of view of linear algebra if and only if b = 0 (we refer to the more general 
maps x — ax + b as affine maps in linear algebra). 

In practice, instead of checking separately that T respects addition and scalar 
multiplication, it may be advantageous to prove directly that 


Tvi + v2) = T(yy) + cT v2) 


for all vectors vj, v2 € V and all scalars c € F. 


Problem 5.2. If T : V — W is a linear transformation, then 7(0) = O and 
T(—v) = -T (v) forall ve V. 


Solution. Since T is linear, we have 
T(0) = T(0+ 0) = T(0) + T0) 
thus T(0) = 0. Similarly, 


Fey = Dy) = CDT) =="). m 


Example 5.3. a) If V is a vector space over F and c € F is a scalar, then the map 
T : V — V sending v to cv is linear (this follows from the definition of a vector 
space). For c = 0 we obtain the zero map, which we simply denote 0, while 
for c = 1 we obtain the identity map, denoted id. In general, linear maps of the 
form v — cv for some scalar c € F are called scalar linear maps. 

b) Consider the vector space V = R[X] of polynomials with real coefficients 
(we could allow coefficients in any field). The map T : V — V sending P 
to its derivative P’ is linear, as follows immediately from its definition. Note 
that if deg P < n, then deg P’ < n, thus the map T restricts to a linear map 
T : R,[X] — R,[X] for all n (recall that R„[X] is the vector subspace of V 
consisting of polynomials whose degree does not exceed n). 

c) The map T : R? —> R defined by T(x, y) = xy +1 is not linear, since T (0, 0) = 
1 Æ 0 (by Problem 5.2). Similarly, the map T : R? —> R? defined by T(x, y) = 
(x, y + 1) is not linear. 

d) Consider the vector space V of continuous real-valued maps on [0, 1]. Then the 
map T : V > R sending f € V to h f(x)dx is linear. This follows from 
properties of integration. 

e) Consider the trace map Tr: M,,(F) — F defined by 
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Tr(A) = ay, +42 +... 4+4ny if A= [ai;]. 


By definition of the operations in M, (F), this map is linear. It has the extremely 
important property that 


Tr(AB) = Tr(BA) 


for all matrices A, B. Indeed, one checks using the product rule that both terms 
are equal to es aijbji. 

f) In the chapter devoted to basic properties of matrices, we saw that any matrix 
A € Mmn(F) defines a linear map F” —> F™” via X — AX. We also 
proved in that chapter that any linear map T : F” —> F™ comes from a 
unique matrix A € Mm n(F). For example, the map T : R? —> R? defined 
by T (x1, x2, x3) = (x1, X2) for all x1, x2, x3 € R is linear and associated with 


the matrix A = k Hr The linear maps T : F” — F” are exactly the maps 
T(X1,..-, Xn) = (a11X1 +... Fan Xp, -< -, Am X1 +... + AmnXn) 
with aij E F. 


g) We introduce now a fundamental class of linear transformations: projections 
onto subspaces. Suppose that V is a vector space over a field F and that W1, W2 
are subspaces of V such that V = W, ® W2. The projection onto W, along W2 
is the map p : V — W; defined as follows: for each v € V, p(v) is the unique 
vector in W; for which v — p(v) € W2. This makes sense, since by assumption v 
can be uniquely written as vı + v2 with v; € W; and v2 € W2, and so necessarily 
pv) = vı. It may not be apparently clear that the map p is linear, but this is 
actually not difficult: assume that v, v’ € V and let w = p(v) and w = p(’). 
Then w, w’ € Wi so w + w € W, and 


(v+v)-(w+w) =(v—-w) + v- w) E W, 
so by definition 
pPvtv)=w+w = piv) + pv). 


We leave it to the reader to check that p(av) = ap(v) for v € V anda E€ F, 
using a similar argument. Note that p(v) = v for all v € W, but p(v) = 0 for 
all v € W2. In general, we call a linear map T : V — V a projection if there is 
a decomposition V = W; © W, such that T is the projection onto W; along W2. 

h) Assume that we are in the situation described in g). We will define a 
second fundamental class of linear maps namely symmetries with respect to 
subspaces. More precisely, for any decomposition V = W, @ W of V into the 
direct sum of two subspaces W,, W> we define the symmetry s : V —> V with 
respect to W, along W, as follows: take a vector v € V, write itv = w; + w2 
with w; € W, and w € Wh, and set 
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S(v) = wi — w2. 


Again, it is not difficult to check that s is a linear map. Note that s(v) = v 
whenever v € W; and s(v) = —v whenever v € W. Note that if F = Fy, 
then s is the identity map, since —v = v for all v € V if V is a vector space 
over F2. In general a linear map T : V — V is called a symmetry if there is a 
decomposition V = W; ® W such that T is the symmetry with respect to W1 
along W2. 


Suppose that T : V — V is a linear transformation. If W is a subspace of V, 
there is no reason to have T(W) C W. However, the subspaces W with this property 
play an absolutely crucial role in the study of linear maps and deserve a special name 
and a definition. They will be extensively used in subsequent chapters dealing with 
deeper properties of linear maps. 


Definition 5.4. Let T : V — V bea linear map on a vector space V. A subspace 
W of V is called stable under T or T-stable if T(W) C W. 


Problem 5.5. Consider the map T : R? — R? sending (x1, x2) to (x2, —x1). Find 
all subspaces of R? which are stable under T. 


Solution. Let W be a subspace of R? which is stable under T. Since R? and {0} 
are obviously stable under T, let us assume that W # {0}, R?. Then necessarily 
dim W = 1, that is W = Rv for some nonzero vector v = (x1, x2). Since W is 
stable under T, there is a scalar c € R such that T(v) = cv, that is (x2,—x,) = 
(cx1, CX). We deduce that x. = cx, and —x, = cx2 = c*x,. Thus (c? + 1)x,; = 0 
and since c € R, we must have x; = 0 and then x2 = 0, thus v = 0, a contradiction. 
This shows that the only subspaces stable under T are R? and {0}. E 


Remark 5.6. The result of the previous problem is no longer the same if we replace 
R by C. In this new situation the line spanned by (1, i) € C? is stable under T. 


If W is a stable subspace, then T restricts to a linear map T : W —> W. For 
instance, one-dimensional stable subspaces (i.e., lines in V stable under T) will be 
fundamental objects associated with linear transformations on finite dimensional 
vector spaces. The following exercise studies those linear maps T for which every 
line is a stable subspace. 


Problem 5.7. Let V be a vector space over some field F and let T : V —> V be 
a linear transformation. Suppose that all lines in V are stable subspaces under T. 
Prove that there is a scalar c € F such that T(x) = cx forall x € V. 


Solution. Let x € V be nonzero and consider the line L = Fx spanned by x. 
By hypothesis T(L) C L, thus we can find a scalar cy such that T(x) = cx - x. We 
want to prove that we can choose cy independently of x. 

Suppose that x and y are linearly independent (in particular nonzero). Then 
x + y # 0 and the equality T(x + y) = T(x) + T(y) can be written 


Cx+y (x + y) H Cx X + Cys y 


5.1 Definitions and Objects Canonically Attached to a Linear Map 153 
or equivalently 


(City = Cx) “x + (Cx+y =~ cy) y= 0. 


This forces Cx+y = cx = Cy. Next, suppose that x and y are nonzero and linearly 
dependent. Thus y = ax for some nonzero scalar a. Then T(y) = aT(x) can 
be written cy + y = ac, + x or equivalently c, - y = Cx + y and forces cy = Cy. 
Thus as long as x, y are nonzero vectors of V, we have cy = cy. Letting c be the 
common value of cy (when x varies over the nonzero vectors in V) yields the desired 
result. E 


Let V and W be vector spaces over F and let us denote Hom(V, W) the set of 
linear transformations between V and W. It is a subset of the vector space M (V, W) 
of all maps f : V — W. Recall that the addition and scalar multiplication in 
M(V, W) are defined by 


F+) S= fogo), CAM = cfo) 


for fg € M(V,W),c € F andveV. 


Proposition 5.8. Let V,W be vector spaces. The set Hom(V, W) of linear trans- 
formations between V and W is a subspace of M(V, W). 


Proof. We need to prove that the sum of two linear transformations is a linear 
transformation and that cT is a linear transformation whenever c is a scalar and 
T is a linear transformation. Both assertions follow straight from the definition of a 
linear transformation. Oo 


We introduce now a fundamental definition: 


Definition 5.9. The kernel (or null space) of a linear transformation T : V —> W is 
kerT = {v € V, T (v) = 0}. 

The image (or range) Im(7) of T is the set 
Im(F) = {T(v)|v e V} CW. 


The following criterion for injectivity is extremely useful and constantly used 
when dealing with linear maps. 


Proposition 5.10. fT : V — W is a linear transformation, then T is injective if 
and only if ker T = {0}. 
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Proof. Since T(0) = 0, it is clear that ker T = {0} if T is injective. Conversely, 
assume that ker T = {0}. If T(v}) = T (v2), then 


TOY = v2) = T (vı) = T (v2) = 0, 


thus vı — v2 € ker T and so vı = v2. Thus T is injective. Oo 


Problem 5.11. Find the dimension of the kernel of the linear map determined by 
the matrix 


12346 
A=|-1 2 12] € M34(R). 
2402 


Solution. Let T be the corresponding linear map, so that 
T (x1, X2, X3, X4) = (X1 — 2x2 + x3, —X1 + 2x2 + x3 + 2x4, —2x1 + 4x2 + 2x4). 
A vector x = (x1, .. . , X4) belongs to ker(T) if and only 


xı — 2x2 + x3 =0 
—x, + 2x. + x3 + 2x4 = 0 
—2x,; + 4x2 + 2x4 = 0 


The matrix associated with this system is A and row-reduction yields 


Thus the previous system is equivalent to 


xı —2xX»-—x4=0 
x34+%x%4=0 


We conclude that 
ker(T) = {(x1, X2, 2X2 — x1, X1 — 2x2)|x1, X2 € R}. 


The last space is the span of the vectors vı = (1,0, —1, 1) and v2 = (0, 1,2, —2) 


and since they are linearly independent (as xıvı + x2v2 = 0 is equivalent to 
(x1, X2, 2x2 — x1, X1 — 2x2) = (0,0,0,0) and so to xı = x2 = 0), it follows that 
dimkerT = 2. Oo 


Problem 5.12. Give a basis for the kernel of the linear map T : R? — R? given by 


T(x, y,Z) = (x — 2y +z,2x — 3y + z, x + y — 2z, 3x — y — 22). 
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Solution. We need to find those (x, y, z) for which T(x, y,z) = 0, in other words 
we need to solve the system 


x—-2y+z=0 
2x —3y+z=0 
x+y—2z=0 
3x —y—2z=0 


The matrix of this homogeneous system is 
1-2 1 
2=3 1 
1 1 -2 
3 —1 —2 


and row-reduction yields 


10-1 
01-1 
00 0 
00 0 


Aref = 


Thus the system is equivalent to 


am 
y-z=0 


and its solutions are given by (x, x, x) with x € R. In other words, 
Ker(T) = {(x, x, x)|x € R} 


and a basis is given by the vector (1, 1, 1). E 


Problem 5.13. Let A = f i and let T : M2(R) — M2(R) be the map defined by 
F(X) = AX. 


(a) Prove that F is a linear transformation. 
(b) Find the dimension of ker(F’) and a basis for ker(F’). 


Solution. (a) For any two matrices X and Y in M>(R) and any scalar c we have 
F(X +cY) = A(X +cY) = AX + cAY = F(X) 4+ cF(Y), 


thus F is a linear transformation. 
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(b) We need to find the dimension and a basis of the space of matrices that are 
solutions of the matrix equation AX = 0. This equation is equivalent to 


Xi + X21 X12 +X2] |00 
X11 + X21 X12 + X22 00 


or X21 = —x11 and x22 = —x12. Thus 
Xil X12 
ker(F) =| Jien: eR 
~X] —X12 


This last space is clearly two-dimensional, with a basis given by 


10 0 1 
STO and oarl 
o 
Proposition 5.14. [fT : V — W isa linear transformation, then ker T and Im(T) 


are subspaces of V, respectively W. Moreover, kerT is stable under T, and if 
V = W then Im(T) is stable under T. 


Proof. Let vı, v2 in ker T and let c e F. We need to prove that vj + cv2 € kerT. 
Indeed, 


TQ +cv)= Tm) +cT(v2) =0+¢-0=0. 


Similarly, if w;,w2 € Im(T), then we can write w; = T (vı) and w2 = T(v2) for 
some v1, v2 E€ V. Then 


wi + cw = T (vi) + cT v2) = T (v, + cv2) € Im(T) 


for all scalars c € F, thus Im(7) is a subspace of W. 

It is clear that Im(T) is stable under T if V = W. To see that ker T is stable 
under T, take v € kerT, so that T(v) = 0. Then T(T@)) = T(0) = O, thus 
T (v) € kerT and so ker T is stable. Oo 


The following problem gives a characterization of projections as those linear 
maps T for which To T = T. 


Problem 5.15. Let V be a vector space over a field F and let T : V > V bea 
linear map on V. Prove that the following statements are equivalent: 


a) T is a projection 
b) We have ToT =T. 
Moreover, if this is the case, then ker T © Im(T) = V. 
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Solution. Assume that a) holds and let us prove b). Assume that T is the projection 
onto W; along Wz for some decomposition V = W; ® W2. Take v € V and write 
v = wy i+w? with w; € W; and wz € W2. Then T(v) = w; and T(7T(v)) = T(wı) = 
w1, hence T(T(v)) = T (v) for all v € V and so T o T = T and b) holds. 

Assume now that T o T = T and let us prove b). We start by proving that 
ker T © Im(T) = V. Suppose that v € ker T N Im(T), so that v = T(w) for some 
w € V, and T(v) = 0. We deduce that 


0 = TO) = T(Tw)) = TW) 


hence v = T(w) = 0 and ker T N Im(T) = {0}. Next, let v € V and put v; = 
v— T (v) and v2 = T (v). Clearly v = vı + v2 and v2 € Im(T). Moreover, 


Tv) =T¢-T) =T)-TT0))=0 


and so vı € ker T. Hence v € ker T + Im(T) and ker T 6 Im(T) = V holds. 

Set W, = Im(T) and W> = ker T. By assumption V = W, ẹ W, and T (v) € W, 
for all v € W. It suffices therefore to prove that v — T (v) € W for all v € V, as this 
implies that T is the projection onto W; along W2. But v — T (v) € W if and only 
if T@ — T(v)) = 0, that is T (v) = T? (v), which follows from our assumption that 
b) holds. Note that the last statement of the problem has already been proved. O 


Remark 5.16. We have a similar statement for symmetries assuming that F € 
{Q, R, C} (so we exclude F = F2). Namely, if V is a vector space over F and 
T : V — V isa linear map, then the following statements are equivalent: 


a) T is a symmetry. 
b) T o T = id, the identity map of V (sending every vector of V to itself). 
Moreover, if this is the case then V = Ker(T — id) @ Ker(T + id). 


5.1.1 Problems for practice 


In the next problems F is a field. 


1. Let f : C — C be a R-linear map. Prove the existence of complex numbers 
a,b such that f(z) = az + bz for all z € C. 
2. Consider the map f : R* > R? defined by 


SF (1, X2, X3, X4) = (X1 + x2 + X3 + X4, 2X1 + X2 — X3 + X4, X1 — X2 + X3 — X4). 


a) Prove that f is a linear map. 
b) Give a basis for the kernel of f. 
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Let V be the space of polynomials with real coefficients whose degree does not 
exceed 3, and let the map f : V — R* be defined by 


fP) = (P0), PQ), POD, PO). 


a) Prove that f is a linear map. 
b) Is f injective? 


. Let n be a positive integer and let V be the space of real polynomials whose 


degree does not exceed n. Consider the map 
f:V >V, f(P(X)) = P(X) +00- X)P'(X), 


where P’(X) is the derivative of P. 


a) Explain why f is a well-defined linear map. 
b) Give a basis for the kernel of f. 


. Find all subspaces of R? which are stable under the linear transformation 


T:R? > R?, T(x, y)=(e+y,—-x + 2y). 


. Let V be the space of polynomials with real coefficients whose degree does not 


exceed n. Let T be the linear transformation on V sending P to its derivative. 
Find all subspaces of V which are stable under T. 


. Let T : R[X] —> R[X] be the map defined by 


T(P(X)) = P(X) —2(X? — 1)P”" (X). 


a) Prove that T is a linear map. 
b) Prove that for all n > 0, the space of polynomials with real coefficients 
whose degree does not exceed n is stable under T. 


. Let V be a vector space over a field F and let 7;,...,7, : V —> V be linear 


transformations. Prove that 


frer T; C ker È n) ; 
i=l 


i=l 


. Let V be a vector space over a field F and let Ti, T) : V — V be linear 


transformations such that 7; o To = T; and 7> o 7; = D. Prove that 
ker 7; = ker 7). 

Let V be a vector space over F and let T : V — V bea linear transformation 
such that 


kerT = kerT? and ImT = ImT?. 


5.2 


11 


12. 


13. 
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Prove that 


V = kerT 6 ImT. 


. For each of the following maps T : R? > R, check that T is linear and then 


check whether ker(T) and Im(7) are in direct sum position. 


a) T(x, y,z) = (x — 2y +z,x =z, xX — 2y +2). 
b) T(x, y,z) = (3(x + y +z),0,x + y +2). 


Let f : R > R be a map such that f(x + y) = f(x) + f) for all real 
numbers x, y. Prove that f is a linear map of Q-vector spaces between R and 
itself. 

(Quotient space) Let V be a finite dimensional vector space over F and let 
W C V be a subspace. For a vector v € V, let 


[vy] = {v+w:we W}. 


Note that [vi] = [v2] if vı — v2 € W. Define the quotient space V/W to be 
{[v] : v € V}. Define an addition and scalar multiplication on V/W by [u] + 
[v] = [u + v] and afv] = [av]. We recall that the addition and multiplication 
above are well defined and V/W equipped with these operations is a vector 
space. 


a) Show that the map x : V —> V/W defined by z(v) = [v] is linear with 
kernel W. 

b) Show that dim(W) + dim(V/W) = dim(V). 

c) Suppose U C V is any subspace with W ® U = V. Show that z|y : U > 
V/W is an isomorphism, i.e., a bijective linear map. 

d) Let T : V — U bea linear map, let W C ker T be a subspace of V, and 
x: V —> V/W be the projection onto the quotient space. Show that there is 
a unique linear map S : V/W — U such that T = S o x. 


5.2 Linear Maps and Linearly Independent Sets 


The following result relates linear maps and notions introduced in the previous 
chapter: spanning sets, linearly independent sets. In general, if T : V —> W is 
linear, it is not true that the image of a linearly independent set in V is linearly 
independent in W (think about the zero linear map). However, if T is injective, then 
this is the case, as the following proposition shows (dealing also with the analogous 
result for spanning sets). 


Proposition 5.17. Let T : V > W be a linear transformation. 


a) If T is injective and if L is a linearly independent set in V, then T(L) := 


{T (1), 1 € L} is a linearly independent set in W. 
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b) If T is surjective and if S is a spanning set in V, then T(S) is a spanning set 
in W. 
c) IfT is bijective and if B is a basis in V, then T(V) is a basis in W. 


Proof. Part c) is simply a combination of a) and b), which we prove separately. 


a) Suppose we have 


caT(h) Ss ae E CnaT (la) =0 


for some scalars c1, . . . , Cn. The previous relation can be written as T (cıl +---+ 
Cnln) = 0, thus cili + -++ + Cyl, € ker T. Since T is injective, we deduce that 
cili +--+ + Cyl, = 0. Hence c1 = co = +++ = Cy = 0. Thus T(L) is linearly 
independent. 


b) Let w € W. Since T is surjective, there is v € V such that T (v) = w. Since S 
is a spanning set in V, we can write v as a linear combination of some elements 
S1,- -Sn Of S, say v = C181 +... + CnSn for some scalars c1, ..., Cn. Then 


w = T (v) = T (cis +... + CnSn) = aT (s1) +... + CnaT (Sn). 


Thus w is in the span of T(s,),..., T (Sn), thus in the span of T (S). Since w € W 
was arbitrary, the result follows. Oo 


The following corollary is absolutely fundamental (especially part c)). It follows 
easily from the previous proposition and the rather subtle properties of finite 
dimensional vector spaces discussed in the previous chapter. 


Corollary 5.18. Let V and W be finite dimensional vector spaces and let 
T : V — W be a linear transformation. 


a) IfT is injective, then dim V < dim W. 
b) IfT is surjective, then dim V > dim W. 
c) IfT is bijective, then dim V = dim W. 


Proof. Again, part c) is a consequence of a) and b). For a), let B be a basis of V 
and let vı, .. . , Vn be its elements. By Proposition 5.17 T (vı), ..., T (vn) are linearly 
independent vectors in W. Thus dim W > n = dim V. 

The argument for b) is similar, since Proposition 5.17 implies that the vectors 
T (vı), ..., T (vn) form a spanning set for W, thus n > dim W. Oo 


We can sometimes prove the existence of a linear map T without having to 
explicitly write down the value of T (x) for each vector x in its domain: if the domain 
is finite dimensional (this hypothesis is actually unnecessary), it suffices to give the 
images of the elements in a basis of the domain. More precisely: 


Proposition 5.19. Let V,U be vector spaces over a field F. Let {v1, v2, ... , Vn} be 
a basis of V and let {u,,u2,...,Un} be any set of vectors in U. Then there is a 
unique linear transformation T : V — U such that 


Ti) =m, T(v2) = Wo,..., Tn) = tn. 
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Proof. We start by proving the uniqueness. Suppose that we have two linear 
transformations T, T’ : V —> U such that T(v;) = T’(v;) = u; for 1 < i <n. Then 
T — T’ is a linear transformation which vanishes at v1, ... , vn. Thus ker(T — T’), 
which is a subspace of V, contains the span of v1, . . . , Vn, whichis V. It follows that 
T=T". 

Let us prove existence now. Take any vector v € V. Since v1, ..., vy form a basis 
of V, we can uniquely express v as a linear combination v = avı +... + GnVn 
for some scalars a),...,d, E€ F. Define T(v) = aiui +... + anun. By definition 
T(v;) = u; for all į, and it remains to check that T is a linear transformation. Let 
v,v € V and letc be a scalar. Write v = avı +...+dy,v, and v = bivi +... + bavn 
for some scalars a;, b j € F. Then 


v+ cv = (ai + cbı)vi +...+ (an + cbn)vn, 


thus 
T(v + cv’) = (aq, + cbi)u + ... + (an + cbn)un = 
(ayu +... + anun) + c(biui +... + brun) = T (v) + cT (v), 
which proves the linearity of T and finishes the proof. E 


Problem 5.20. Find a linear transformation T : R? —> R*, whose image is the 
linear span of the set of vectors 


{(1,2,1, 1), 3, 1,5, 2)}. 


Solution. Let e; = (1,0,0), e2 = (0, 1,0) and e3 = (0,0, 1) be the standard basis 
of RÈ. Let vı = (1,2,1,1) and v2 = (3,1,5,2). By Proposition 5.19 there is a 
linear transformation T : R? > R* such that 


Tie) =v, T(e2) =v, T(e3) = 0. 


The image of T is precisely the set of linear combinations of T (e), T(e2) and 
T (e3), and this is clearly the span of vı, v2. 

We note that T is very far from being unique: we could have taken T(e3) = v2 
for instance (there are actually lots of linear maps with the desired property). Oo 


Problem 5.21. Let 


vı = (1,0,0), v= (1,1,0), v= (1,1,1) 
and let T : R? > R? be a linear transformation such that 
To) = 3,2), Ti) = (1,2), Ts) = (0,1). 


Compute the value of T(5, 3, 1). 
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Solution. We look for scalars a, b, c such that 
(5,3, 1) = avı + bv + €13, 
as then, by linearity, 
T (5,3, 1) = aT (vı) + bT (v2) + cT (v3). 
The equality 
(5,3, 1) = avı + by. + cv, 
is equivalent to 
(5,3, 1) = (a,0,0) + (6,6,0) + (c,c,c) = (a +b +c,b +c,c). 


Thus c = 1,b+c = 3 anda + b + c = 5, which gives 


It follows that 


T(5,3,1) = 2T (vı) + 2T (v2) + T3) = (6, 4) + (-2, 4) + (0, 1) = (4, 9). £ 


Remark 5.22. One can easily check that vı, v2, v3 form a basis of R?, thus such a 
map exists by Proposition 5.19. 


Problem 5.23. Determine the linear transformation T : R? —> R? such that 
T(1,0,1) = (1,0,0), T(0,1,1)= (0,1,0), T(0,0,1)= (1,1,1). 


Solution. We start with an arbitrary vector v = (x1, X2, x3) and look for scalars 
kı, k2, k3 such that 


v = kı(1,0, 1) + k2(0, 1, 1) + k3(0,0, 1). 
If we find such scalars, then 
T(v) = kıT(1,0,0) + k2T (0,1,1) + ksT (0,0,1) = 
(kı, 0,0) + (0, k2,0) + (k3, k3, k3) = (ki + k3, k2 + ks, k3). 
The equality 


v = kı(1,0,1) + k2(0, 1, 1) + k3(0,0, 1) 
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is equivalent to 
(x1, X2, x3) = (kı, k2, ky + k2 + k3) 
or 
ki= x, ky =X, k3 = x3- xı — Xp. 
Thus for all (x1, x2, x3) € R? 


T (x1, x2, x3) = (kı + k3, k2 + k3, k3) = (x3 — x2, x3 — x1, X3 — X1 — X2). p 


5.2.1 Problems for practice 


1. Describe the linear transformation T : R? > R? such that 
T(0,1,1) = (1,2,3), T(1,0,1) = (1,—1,2) 
and 
T(, 1,0) = (—1,—1,—1). 
2. Is there a linear map T : R? — R? such that 
T(1,1)= (1,2), TU,-1) = (1,2), T(2,3)= (1,2)? 
3. Find all real numbers x for which there is a linear map T : R? —> R? such that 
T(1,1,1)= (1,x,1), T(1,0,—1)= (1,0,1) 
and 
T(—1,—1,0) = (1,2,3), T(1,—1,—1) = (1, x,—2). 
4. Find a linear map T : R* — R? whose image is the span of the vectors 
(—1, —-1, —1) and (1, 2, 3). 
5. a) Let V be the space of polynomials with real coefficients whose degree does 
not exceed 3. Find all positive integers n for which there is a bijective linear 
map between V and M,,(R). 


b) Answer the same question if the word bijective is replaced with injective. 
c) Answer the same question if the word bijective is replaced with surjective. 
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We have already seen in the chapter devoted to matrices that all linear maps 
T : F” + F” are described by matrices A € M,,,(F). We will try to extend 
this result and describe linear maps F : V — W between finite dimensional 
vector spaces V, W in terms of matrices. The description will not be canonical, 
we will need to fix bases in V and W. All vector spaces in this section are finite 
dimensional over F. 

It will be convenient to introduce the following definition: 


Definition 5.24. A linear transformation T : V —> W is called an isomorphism of 
vector spaces or invertible if it is bijective. In this case we write V ~ W (the map 
T being understood). 


Problem 5.25. Let T : V — W be an isomorphism of vector spaces. Prove that its 
inverse T7! : W — V is an isomorphism of vector spaces. 


Solution. The map T7! is clearly bijective, with inverse T. We only need to check 
that T—! is linear, i.e. 


T! (w; + cw2) = T7! (w1) + eT! (w2) 


for all vectors w1, w2 € W and all scalars c € F. Let vı = T~!(w,) and v = 
T~! (w2). Then T (vı) = wı and T (v2) = wo, thus 


T! (wi + ews) = T7! (T (v1) + cT v2)) = T7! (T (vi + cv2)) = vi + cv, 


as needed. O 


It turns out that we can completely classify finite dimensional nonzero vector 
spaces up to isomorphism: for each positive integer n, all vector spaces of 
dimension n are isomorphic to F”. More precisely: 


Theorem 5.26. Letn be a positive integer and let V be a vector space of dimension 
n over F. If B = (e),...,€n) is a basis, then the map ig : F” — V sending 
(X1, ..., Xn) to xyey + X262 +... + Xnen is an isomorphism of vector spaces. 


Proof. Itis clear that i g is linear and by definition of a basis it is bijective. The result 
follows. E 


Remark 5.27. Conversely, if T : V —> W is an isomorphism of vector spaces, then 
necessarily dim V = dim W. This is Corollary 5.18 (recall that we only work with 
finite dimensional vector spaces). 


Thus the choice of a basis in a vector space of dimension n allows us to identify 
it with F”. Consider now a linear map T : V —> W and suppose that dim V = n 
and dim W = m. Choose bases By = (vi, ..., Vn) and Bw = (wj,...,Wm) in V 
and W, respectively. By the previous theorem we have isomorphisms 


ig, : F" > V, igy: F” >W. 
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We produce a linear map g by composing the maps ig, : F” > V,T:V > W 
and ips : W —> F”: 


gr: F” > F”, pr =ip, oT oigņ. 


Since @r is a linear map between F” and F™” it is uniquely described by a matrix 
A E€ Mmn(F). This is the matrix of T with respect to the bases By and By. 
It highly depends on the two bases, so we prefer to denote it Matz, 3, (T). We put 
By (i.e., the basis on the target of T) before By (the basis at the source of T) in 
the notation of the matrix for reasons which will be clear later on. When V = W 
and we fix a basis B of V, we write Matg(T) instead of Matg g (T), the matrix of 
T with respect to the basis B both at the source and target of T. 

The previous construction looks rather complicated, but it is very natural: we 
have a parametrization of linear maps between F” and F” by matrices, and we can 
extend it to a description of linear maps between V and W by identifying V with 
F” and W with F”, via the choice of bases in V and W. Note the fundamental 
relation 


ipy(AX) = T(igy(X)) if X¢F" and A= Matgy. g, (T). (5.1) 


Taking for X a vector in the canonical basis of F”, we can make everything 
completely explicit: let e;,...,e, be the canonical basis of F” and fi,..., fm the 
canonical basis of F”. If A = [a;;], then by definition Ae; = ay; fi +... + ami fm» 
thus for X = e; we have 


igy (AX) = igy (au fi + dai fo +... + ami fn) 
= A1iW1 + a2iW2 +... + AmiWm 


On the other hand, ig, (e;) = v;, thus relation (5.1) is equivalent to the fundamental 
and more concrete relation 


T (vi) = A1iW1 + a2iW2 +... + AmiWm. (5.2) 


In other words: 


Proposition 5.28. Let T : V — W be a linear transformation and let By = 
(1,---,%), Bw = (wi,...,Wm) be bases in V and W. Then column j of 
Matgy.By (T) € Minn(F) consists in the coordinates of T (vi) with respect to the 
basis Bw. In other words, if Matgy By (T) = [aij] then for all 1 <i < n we have 


T (vi) = X ajiw;. 


j=l 
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Problem 5.29. Find the matrix representation of the linear transformation T : 
R? — R? defined by 


T(x, y,z) = (x +2y-—zy+zx+ y — 2z) 


with respect to the standard basis of R3. 


Solution. Let e; = (1,0,0), e2 = (0,1,0), e3 = (0,0,1) be the standard basis 
of R°. Then 


T (e1) = T(1,0,0) = (1,0,1) = le; + Oe, + le; 
T (e2) = T(0,1,0) = (2,1,1) = 2e; + lez + les 
T (e3) = T (0,0,1) = (-1, 1,—2) = —le, + lez — 263. 


Thus the matrix representation of T with respect to the standard basis is 


12-1 
O11 
11-2 


oO 


Problem 5.30. Let P,, be the vector space of polynomials with real coefficients, of 
degree less than n. A linear transformation T : P — Ps is given by 


T(P(X)) = P(X) + X?P(X) 


a) Find the matrix of this transformation with respect to the basis B = {1, X + 
1, X? + 1} of P; and the standard basis C = {1, X, X?, X3, X4} of Ps. 
b) Show that T is not onto and it is injective. 


Solution. a) We need to find the coordinates of T(1), T(X + 1) and T(X? + 1) 
with respect to the basis C. We have 


TA)=1+X?=1:-1+0:-X+1-X?+0.-X? +0- X, 
T(X+1) = X+1+X°(X +1) = 1+X+X?’+X? = 1-1+1:-X+1-X?+1-X?+0-X*, 
T(X?+1) = X?+1+X?’ (X? +1) = 1+2X?+X* = 1-140-¥42-X740-K34 x4. 


It follows that the required matrix is 
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111 
010 
112 
010 
001 


b) Since dim P; = 5 > dim P, = 3, T cannot be onto. To prove that T is 
injective, it suffices to check that ker(T) = 0. But if P € ker(7), then 
P(X) + X? P(X) = 0, thus (1 + X?)P(X) = 0. Since 1 + X? F 0, it follows 
that P(X) = 0 and so ker(T) = 0. oO 


Problem 5.31. Let V be the space of polynomials with real coefficients whose 
degree does not exceed n, a fixed positive integer. Consider the map 


T:V >V, T(P(X)) = P(X +1). 
(a) Prove that T is an invertible linear transformation. 
(b) What are the matrices of T and T7! with respect to the basis 1, X,..., X” of V? 


Solution. a) It is not difficult to see that T is a linear transformation, for if P,, P2 
are vectors in V and c is a scalar, we have 


T((P, + cP2)(X)) = (Pi + cP2)(X + 1) = P(X +1) +c P(X + 1) 


= T(P\(X)) + cT(P2(X)). 


Next, to see that T is invertible it suffices to prove that T is bijective. We can 
easily find the inverse of T by solving the equation P(X + 1) = Q(X). This is 
equivalent to P(X) = Q(X —1), thus the inverse of T is given by T~'(P(X)) = 
P(X — 1). 

b) For 0 < j < n the binomial formula yields 


j ` 
TX) =(X +1) =% (7) 


i=0 


and 


i 
i=0 


j ; 
TX) = (X -17 = X j2 


Thus if A = [a;;] and B = [b;;] are the matrices of T, respectively T7! with 
respect to the basis 1, X,..., X”, we have (with the standard convention that 


(7) = Oforn < k) 
1-() sort) 
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Remark 5.32. Since T and T7! are inverse to each other, the product of the matrices 
A and B is the identity matrix /,,. We invite the reader to use this in order to prove 
the following combinatorial identity: 


n i j k 7 
LD (7) (5) = 


where the right-hand side equals 1 if i = k and 0 otherwise. 


The next result follows formally from the fundamental bijection between linear 
maps F” — F” and matrices in Mm n(F). Recall that Hom(V, W) is the vector 
space of linear maps T : V > W. 


Theorem 5.33. Let By, By be bases in two (finite-dimensional) vector 
spaces V,W. The map T —> Matg,,.py(T) sending a linear transformation 
T : V — W to its matrix with respect to By and By is an isomorphism of 
vector spaces 


Hom(V, W) > Mm n(F). 


Proof. Let p(T) = Mats, B, (T). It is clear from Proposition 5.28 that ọ is 
a linear map from Hom(V, W) to Mm.n(F). It is moreover injective, since if 
(T) = 0, Proposition 5.28 yields T(v;) = O for all i, thus ker T contains 
Span(vı, ..., Vn) = V and T = 0. To see that ọ is surjective, start with any matrix 
A = [aij] E€ Mm n(F). It induces a linear transformation g4 : F” —> F” defined by 
X — AX. By construction, the linear transformation T = igy © 94 ° iz, satisfies 
(T) = A. More concretely, since v1,..., Vn is a basis of V, there is a unique linear 
map T : V — W such that 


m 


T(v:) = Yo aj; 


j=l 


for all 1 < i < n (Proposition 5.19). By Proposition 5.28 we have Matz, g, = A 
and we are done. E 


Recall that dim M,,,,(F) = mn, a basis being given by the canonical basis 
(Eij)i<i<m,1<j<n. The theorem and Remark 5.27 yield 


dim Hom(V, W) = dim V - dim W. 


We conclude this section with some rather technical issues, but which are 
absolutely fundamental in the theory of linear transformations. First, we want to 
understand the link between the matrix of a composition T o S of linear maps and 
the matrices of T and S. More precisely, fix two linear maps T : V — W and 
S: W — U and set m = dimV,n = dim W, p = dimU. Also, fix three bases 
By, By and By in U, V, W respectively. Let us write for simplicity 


A= Matz, By (S) and B= Matgz,, 2, (T). 
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Corresponding to By, By, Bw we have isomorphisms 
ig, : F” > V, igy: F" >~W, igy: F? >U 
and by definition of A, B we have (relation (5.1)) 
ipy (BX) = T (is, (X)), X € F", ig, (AY) = S(isy(Y)), Y € F”. 
Applying S to the first relation and then using the second one, we obtain for X € F” 
S oT (in, (X)) = S(ipy (BX) = isy (ABX). 
This last relation and the definition of Matg,, gy (S o T) show that 
Matz,,.2,(S0T)=A-B. 


In other words, composition of linear transformations comes down to multipli- 
cation of matrices or formally 


Theorem 5.34. Let T : V —> W and S : W — U be linear transformations 
between finite-dimensional vector spaces and let By, By, Bw be bases of U,V 
and W, respectively. Then 


Matgy.By (S o T) = Matgy Bw (S) . Matgy a, (T). 


A less technical corollary which will be constantly used is the following: 
Corollary 5.35. Let T, h : V — V be linear transformations on a finite 
dimensional vector space V and let B be a basis of V. Then 


Matz (Tı o T2) = Matg(T,) - Matz (Tə). 


Problem 5.36. Let V be the space of polynomials with real coefficients whose 
degree does not exceed 2. Consider the maps 

T:R? SV, T(a,b,c)=a+2bX +3cX? 
and 


S: V > MR), Sa +X +eX*) =| a alt 


a—c b 


We consider the basis B4 = (1, X, X?) of V, the canonical basis Bz of R? and the 
canonical basis B3 = (E11, E12, E21, E22) of M2 (R). 


a) Check that T and S are linear maps. 
b) Write down the matrices of T and S with respect to the previous bases. 
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c) Find the matrix of the composition S o T with respect to the previous bases. 
d) Compute explicitly S oT, then find directly its matrix with respect to the previous 
bases and check that you obtain the same result as in c). 


Solution. a) Let u be a real number and let (a, b, c) and (a’, b’, c’) be vectors in R°. 
Then 
T(u(a, b,c) + (a',b',c')) = T(aut+a’,bu+b',cu+c’) 


= (au + a") + 2(bu + b')X + 3(cu+c')X? = 
u(a +2bX + 3cX) + (a' +2b'X +3c' X’) = uT (a,b,c) + T(a',b', c”), 
thus T is linear. One checks similarly that S' is linear. 


b) We start by computing the matrix Matg,,g, (T) of T with respect to Bı and By 
Let By = (e1, e2, e3) be the canonical basis of R*, then 


T (er) = T(0, 1,0) = 2X =0-14+2-X +0-X’, 


T(e3) = T(0,0,1) = 3X? =0-14+0-X¥ +3- X?, 


hence 
100 


Matz, 2,(T) =]020 
003 


Similarly, we compute 


11 
si) = [19] ate En +1 Be tl Ba +0- En, 
01 
Sœ = |p 1 | =0 En +1- En +0- En +1- En, 
; 00 
S(Xx*) = 10 = 0. En +0- En + (-1)- En +0- En, 
hence 
10 0 
11 0 


Matz, 2, (S) = 10-1 
01 0 


5.3 Matrix Representation of Linear Transformations 171 


c) Using the theorem, we obtain 


Matz, B(S o T) = Matz, 8, (S) . Matz, 2,(T) 


10 0 10 0 
J110 |. ae = 20, 
~ 110-1 ne ~ ee: 

01 0 02 0 


d) We compute 


E A E AA A E eer a E 
a—3c 2b 
Next, 
11 
(So De) =|] o| =1 Eu +1-En+1: En +0: En, 
02 
(So Dle) = |p 3 | =0 En +2- En +0- En +2: E» 
and 


0 0 


(SoT)(es) = É ; 


| =0- E +0- En + (3); En +0- En 


and so the matrix of S o T is 


10 0 
12 0 
Matz, B, (S oT)= 103l’ 
02 0 
which coincides of course with the one obtained in part c). Oo 


Problem 5.37. Let A € M,,(F) and let T : F” — F” be the linear map sending X 
to AX. Prove that A is invertible if and only if T is bijective. 


Solution. If A is invertible, let B € M,(F) be such that AB = BA = I. Let 
S : F” —> F” be the map X —> BX. Then SoT has associated matrix (with respect 
to the canonical basis in F”) BA = I, thus So T = Id. Similarly, T o S = id, thus 
T is bijective. 

Next, suppose that T is bijective and let B be the matrix of T7! with respect 
to the canonical basis. Then AB is the matrix of T o T~! = id with respect to 
the canonical basis, thus AB = [,. Similarly BA = I, and A is invertible with 
inverse B. Oo 
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Next, suppose that we have a linear map T : V —> W, witha given matrix A with 
respect to two bases 6, C1 of V, W respectively. Let us choose two new bases 82, C2 
of V, W respectively. We would like to understand the matrix of T with respect to 
the new bases. To answer this, we need to introduce an important object: 


Definition 5.38. Let V be a vector space and let B = (vy,...,v,) and B’ = 
(vi> <. -, V) be two bases of V. The change of basis matrix from B to B’ is the 
matrix P = [p;;] whose columns are the coordinates of the vectors v}, ... , v}, when 
expressed in the basis B. Thus 


v; = Ppijvi +... + PnjVn 


forl <j <n. 


Problem 5.39. Consider the vectors 


vı = (1,2), v = (1,3). 


a) Prove that B’ = (vı, v2) is a basis of R?. 
b) Find the change of basis matrix from B’ to the canonical basis of R°. 


Solution. a) It suffices to check that v; and v2 are linearly independent. If av; + 
by, = 0 for some real numbers a, b, then 


(a, 2a) + (b, 3b) = (0,0) 
thus 


a+b=0, 2a+3b=0. 


Replacing b = —a in the second equation yields a = b = 0. 
b) Let B = (e1, e2) be the canonical basis. We need to express e1, €2 in terms of 
vı, v2. Let us look for a, b such that 


ei = avı + bro. 
Equivalently, we want 
a+b=1, 2a+3b=0. 
This has the unique solution a = 3,b = —2, thus 
e] = 3vi = 2v2. 


Similarly, we obtain 


e2 = —Vvı + v2. 
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The coordinates 3, —2 of e, when written in base B’ yield the first column of the 
change of basis matrix, and the coordinates —1, 1 of e2 when written in base B’ 
yield the second column, thus the change of basis matrix is 


3 -l1 
P= . 
É 1 | Oo 
Problem 5.40. Let V, B, B’, P be as above. Consider a vector v € V and let X and 


X’ be the column vectors representing the coordinates of v with respect to the bases 
B and B’. Prove that X = PX’. 


Solution. Indeed, by definition we have 


AOR Doh 
v = XV... XnYn = Xiv +... + XV 


nen? 


thus 
n n n n 
2 xev =) xV = DO] DD Pv 
k=1 j=l j= k=1 
n n n 
= OO > pig) )ve = (PX eve 
k=1 j=l k=1 
and since v1, ... , Vn are linearly independent, it follows that X = PX’. Oo 


Remark 5.41. The previous definition and problem are always a source of confusion 
and trouble, so let us insist on the following issue: the change of basis matrix from 
B to B’ expresses B’ in terms of B, however (and this is very important in practice) 
as the problem shows, the change of basis matrix takes coordinates with respect to 
B’ to coordinates with respect to B. Thus we have a change of direction. 


We also write Matg(B’) for the change of basis matrix from B to B’. A simple 
but very important observation is that 


Matz (B’) = Matz g (idy), 


as follows directly from Proposition 5.28. Using this observation and Theorem 5.34, 
we deduce that for any bases B, B’, B” of V we have 


Mat, (B’) : Mat, (B”) = Mat, (B”). 
Since Matg(B) = J, for any basis B, we deduce that 
Matz (B’) - Matg/(B) = Ip. 


Thus the change of basis matrix is invertible and its inverse is simply the change 
of basis matrix for the bases B’, B. 
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Problem 5.42. Consider the families of vectors B = (v1, v2, v3), B’ = (w1, w2, w3), 
where 


1 = (0,1), %=0,0,1, v= (1,0) 


and 
wi =(1,1,-1), w2= (1,0,—1), w3 = (-1,-1,0). 
a) Prove that B and B’ are bases of R?. 
b) Find the change of basis matrix P from B to B’ going back to the definition of P. 


c) Find the change of basis matrix P using the canonical basis of R? and the 
previous theorem. 


Solution. a) To prove that 5 is a basis, we find the reduced row-echelon form of 
the matrix 


011 
A=]101 
110 


using row-reduction. This yields 
Aref = I; 
and so v1,V2,v3 are linearly independent, hence a basis of R*. We proceed 
similarly with w; , w2, w3. 
b) First, we use the definition of P: the columns of P are the coordinates of 
w1, W2, w3 When expressed in the basis B. First, we try to express 
wı = avı + bv. + cv = (b + c,a +c,a +b) 
which gives 


b+c=1, a+c=1, at+b=-l, 


with the solution 


Thus 
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1 
T2 
and the first column of P is -4 . Similar arguments yield 
3 
2 
w2 = —vı + V3, 
—1 
hence the second column of P is | O | and finally w3 = —v3, thus the third 
1 
0 
column of P is | 0 |. We conclude that 
—1 
1 
Po —1 0 
5; 1 -l 


c) Let B” = (e1, e2,e3) be the canonical basis of R*. We want to find Mat,(B’) 
and we write it as 


Matz (B’) = Mat, (B”) - Matgy (B’) = (Matgr (B)! - Matgr (B’). 
Next, by definition 


O11 phat 
Matpy (B) = 101], Matgy (B’) = 1 0-1 
110 -1—1 0 


Next, using either the row-reduction algorithm or by solving the system 
011 


101 | X = b, one computes 


110 
! Il 1l 
011 -33 32 
ror} =| r4] 
noj Lir 
and finally 
i i) fA 1 
E A 2 1 1-1 -3 —1 0 
pn Aa 9 —1 —1 0 5 1 -l 
Without any miracle, we obtain the same result as in part b)! Oo 


Similar arguments give the following fundamental: 
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Theorem 5.43. Let T : V — W be a linear map and let B,, Bz be two bases of V, 
Cı, C2 two bases of W. If P = Matc, (C2) and Q = Matz, (B2) are the change of 
basis matrices, then 
Matc, B- (T) = P~'Matc,.,(T)Q. 

Proof. By Theorem 5.34 we have 

PMatc, 2, (T) = Matc, c (idw) . Matc, B» (T) = Matc, 2, (T) 
and similarly 

Matc,.2,(7)Q = Matc,,s, (T )Matg, B, Gdy) = Matc,.2,(T). 


Thus 


PMatc, 2,(T) = Matc,.2, (T) Q 


and the result follows by multiplying on the left by the invertible matrix P. E 


Here is a different proof which has the advantage that it also shows us how to 
recover the rather complicated formula in the previous theorem (experience shows 
that it is almost impossible to learn this formula by heart). It assumes familiarity 
with the result of Problem 5.40 (which is however much easier to remember!). 

Write A, for the matrix of T with respect to B1, Cı and A, for the matrix of T 
with respect to B2, Cy. Start with a vector v in V and write X1, X> for its coordinates 
with respect to Bı and B3 respectively. By Problem 5.40 we have X; = QX3. Let 
Yı, Y» be the coordinates of T (v) with respect to C; and C, respectively. Again by 
Problem 5.40 we have Y; = PY >. On the other hand, by definition of A; and A2 
we have A,X; = Yı and A,X, = Y>. Since P and Q are invertible, we obtain 
X, = O7'X, and so 


A,X, = Yı, = PY, = PAX, = PA.Q™'X. 


Since this holds for every v € V (equivalently, for any X1), we deduce that 
A, = PA2Q™! and so A) = P~'AQ. 

While the previous results are quite a pain in the neck to state and remember, 
the following special case is absolutely fundamental and rather easy to remember 
(or reprove) 


Corollary 5.44. LetT : V — V be a linear transformation on a finite dimensional 
vector space V and let B, B' be bases of V. If P is the change of basis matrix from 
B to B’, then 


Matz: (T) = P~'Matg(T)P. 
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Here is how one should recover this result in case of doubt: write X,, X H for the 
column vectors representing the coordinates of v € V with respect to B, B’. Then 
Xrov) = Matz (T)X, XT) = Mat, (T) X’ 


and by Problem 5.40 we have X, = PX! and Xrq) = PX 7): Thus Combining 
these relations yields 


PMatg’(T) = Mat, (T)P, 
both being equal to PX Tv)" Multiplying by P7! yields the desired result. 


Problem 5.45. Consider the matrix 


2-10 
A= | -2 1 -2 
1 1 3 


and let T : R? —> R? be the associated linear transformation, thus T (X) = AX for 
all X € R°. Consider the vectors 


1 1 1 


a) Prove that v1, v2, v3 form a basis of R? and compute the matrix of T with respect 
to this basis. 

b) Find the change of basis matrix from the canonical basis to the basis (v1, v2, v3). 

c) Compute A” for all positive integers n. 


Solution. a) It suffices to check that vı, v2, v3 are linearly independent. If a, b, c are 
real numbers such that av, + bv + cv3 = 0, we obtain 


at+b+c=0, a-c=0, -a—b=0. 


The first and third equations yield c = 0, then the second one gives a = 0 and 
finally b = 0. Thus vj, v2, v3 are linearly independent and hence they form a 
basis. Another method for proving this is as follows: consider the matrix whose 
columns are the vectors v1, v2, v3 are use row-reduction to bring this matrix to its 
reduced row-echelon form. We end up with /3 and this shows that vı, v2, v3 is a 
basis of R°. 

To compute the matrix of T with respect to the new basis, we simply express 
each of the vectors T (v1), T(v2), T (v3) in terms of v1, v2, v3. We have 


1 
Tı) = Avı = 1 | =v, 
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then 
2 
T (v2) = Av» = 0 = 2v2 
—2 
and 
3 
T (v3) = Ay3 = —3 = 3v3. 
0 


We conclude that the matrix of T with respect to the basis (v1, v2, v3) is 


100 
B=]020 
003 


b) Call the change of basis matrix P . By definition, the columns of P consist of the 
coordinates of v1, v2, v3 with respect to the canonical basis of R?. Thus 


1 1 1 
P=] 1 0-1 
—1 —1 0 


c) The matrix of T with respect to (v1, v2, v3) is, thanks to the change of matrix 
formula, equal to P~! AP. Combining this observation with part a), we deduce 
that 


100 
P'AP =| 020 
003 


Raising this equality to the nth power and taking into account that (P~! AP)" = 
P—'!A" P (this follows easily by induction on n) yields 


10 0 
P'A"P =| 02" 0 
0 0 3” 


It follows that 
10 0 
A°=P|02"0 |P. 
0 0 3” 
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We can easily compute P~! either by expressing the vectors of the canonical 
basis in terms of v1, v2, v3, or by solving the system PX = b. We end up with 


1 1 1 
P™! = | —1 —1 —2 
1 0 1 
Finally, 
100 1—2” 4+3” 1—2" ea ee 
A" = P| 02% 0 | P = 1—3" 1 1-3" 
0 0 3” 7—1 2-1 perl 


Problem 5.46. Let T : R? — R? be the linear map defined by 
T(x, y,z) = 2x +y —z, y, x + y). 
Let e1, e2, e3 be the canonical basis of R? and let 
vı = 61 + e3, w= ei + ez, V3 =e) + e2 + 63. 
a) Prove that (v1, v2, v3) is a basis of R°. 


b) Find the matrix of T with respect to this basis. 


Solution. a) In order to prove that (v1, v2, v3) is a basis of R°, it suffices to check 
that they are linearly independent. If 


avı + bv + cv = 0, 
for some real numbers a, b, c, then 
(a—b + c)eı + (b + c)e2 + (a + c)ez = 0. 
Since e4, e2, e3 are linearly independent, this forces 
a-b+c=0, b+c=0, a+c=0. 


The first and third equations yield b = 0, then c = 0 anda = 0. Thus (1, v2, v3) 
is a basis of R°. Another method for proving this is as follows: consider the 
matrix A whose columns are the coordinates of v1, v2, v3 when expressed in terms 
of the canonical basis, that is 


1-11 
A=]011 
101 
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Row-reduction yields A;-¢ = I3 and the result follows. 
b) We compute 


Tm = TC,0, 1) = d,0,1) = 4, 
then 
Tv2) = T(-1,1,0) = (-1,1,0) = v 
and finally 
T(v3) = TC, 1,1) = (2,1,2). 


To conclude, we need to express the vector (2, 1, 2) in terms of v1, v2, v3. We look 
therefore for a, b, c such that 


(2,1,2) = avı + bv. + cv3 
or equivalently 
(2,1,2) = (a-—b+c,b4+c,a+c). 
Solving the corresponding linear system yields 


a=1, b=0, c=l1. 


Thus T (v3) = vı + v3 and so the matrix of T with respect to (v1, v2, v3) is 


101 
B={010 


001 = 


Motivated by the previous corollary, we introduce the following fundamental 
definition: 


Definition 5.47. Two matrices A, B € M,(F) are called similar or conjugate if 
there exists P € GL, (F) such that B = P7! AP. Equivalently, they are similar if 
they represent the same linear transformation of V = F” in possibly two different 
bases. 


It is an easy exercise for the reader to prove that similarity is an equivalence 
relation on M, (F), that is 


e any matrix A is similar to itself. 
e If Ais similar to B, then B is similar to A. 
e If Ais similar to B and B is similar to C, then A is similar to C. 
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One of the most fundamental problems in linear algebra is the classification of 
matrices up to similarity. In fact, the main goal of the next chapters is to prove that 
suitable matrices are similar to rather simple matrices: we dedicate a whole chapter 
to matrices similar to diagonal and upper-triangular ones, and we will see in the last 
chapter that any symmetric matrix with real entries is similar to a diagonal matrix. 


5.3.1 Problems for practice 


1. Let B = (e1, e2) be the canonical basis of R? and let B’ = (fi, f2), where 
Ai=erte,, fo=ert2e. 


a) Prove that B’ is a basis of R?. 

b) Find the change of basis matrix P from B to 6’, as well as its inverse. 

c) Let T be the linear transformation on R? whose matrix with respect to the 
1-1 
2-3 
of T with respect to the bases B’ on the target and 8 on the source. 


basis B (both on the source and target of R?) is A = | | Find the matrix 


2. Consider the matrix 


17 —28 4 
A = | 12-203 
16 —28 5 


and the associated linear map T : R? —> R? defined by T (X) = AX. 


a) Find a basis B, of the kernel of T. 

b) Let V be the kernel of T — id, where id is the identity map on R?. Give a basis 
By of V. 

c) Prove that V @ ker(T) = R°. 

d) Find the matrix of T with respect to the basis B, U B2 of R°. 


3. Let B = (vı, v2, v3), where 


1 —1 1 
y= ]O}, w=] 1j, w=] 2 
2 0 3 
and let B’ = (w1, w2, w3), where 
2 —3 —2 
wy=]O], wo=]—2], w3=] -3 
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a) Prove that B and B’ are both bases of R?. 
b) Find the change of basis matrix P from B to B’ as well as its inverse P!. 
c) Consider the linear transformation T : R? —> R? whose matrix with respect 
104 
to the basis 6 (both on the source and target of T) is | 0 10 |. Find the 
—201 
matrix of T with respect to 6’ (both on the source and target of T). 


4. Let V be a vector space over a field F, of dimension n. Let T : V —> V bea 
projection (recall that this is a linear map such that T o T = T). 


a) Prove that V = Ker(T) @ Im(T). 
b) Prove that there is a basis of V in which the matrix of T is E i | for 
some į € {0,1,...,n}. 


5. Let V be a vector space over C or R, of dimension n. Let T : V > V bea 
symmetry (that is a linear transformation such that T o T = id is the identity 
map of V). 


a) Prove that V = ker(T — id) @ ker(T + id). 
b) Deduce that there is i € [0, n] and a basis of V such that the matrix of T with 


l 0 
t to this basis i : 
respect to this basis is | 0 | 


6. Let T be the linear transformation on R? whose matrix with respect to the 
canonical basis is 


-1 11 
A=|-6 42 
3 —11 


a) Check that A? = 2A. 

b) Deduce that T (v) = 2v for all v € Im(T). 

c) Prove that ker(T) and Im(T) are in direct sum position in R°. 

d) Give bases for ker(T) and Im(T), and write the matrix of T with respect 
to the basis of R? deduced by patching the two bases of ker(T) and Im(T) 


respectively. 
7. Let A = E r and consider the map T : M2 (C) —> M2(C) defined by 


T(B) = AB — BA. 


a) Prove that T is linear. 
b) Find the matrix of T with respect to the canonical basis of M3 (C). 
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ine 


Let V be the vector space of polynomials with complex coefficients whose 

degree does not exceed 3. Let T : V — V be the map defined by T(P) = 

P + P’. Prove that T is linear and find the matrix of T with respect to the basis 

1, X, X?, X?’ of V. 

9. a) Find the matrix with respect to the canonical basis of the map which projects 
a vector v € R? to the xy-plane. 

b) Find the matrix with respect to the canonical basis of the map which sends a 
vector v € R? to its reflection with respect to the xy-plane. 

c) Let 0 e R. Find the matrix with respect to the canonical basis of the 
map which sends a vector v € R° to its rotation through an angle 0, 
counterclockwise. 

10. Let V be a vector space of dimension n over F. A flag in V is a family of 
subspaces Vo C Vi C ... C Va such that dim V; = i for alli € [0,n]. Let 
T : V — V bea linear transformation. Prove that the following statements are 
equivalent: 


a) There is a flag Vo C ... C Vp in V such that T(V;) C V; for alli € [0,7]. 
b) There is a basis of V with respect to which the matrix of T is upper-triangular. 


11. Prove that the matrices 


1100 1234 
ise: 0110 id pa 0123 
0011 0012 
0001 0001 


are similar. 


5.4 Rank of a Linear Map and Rank of a Matrix 


In this section we discuss a very important numerical invariant associated with a 
linear transformation and to a matrix: its rank. All vector spaces over the field F 
will be assumed to be finite dimensional in this section. 


Definition 5.48. Let V, W be finite dimensional vector spaces over F. The rank of 
a linear map T : V — W is the integer 


rank(T) = dim Im(T). 
Let us try to understand more concretely the previous definition. Let T : V > W 
be a linear transformation and let v),...,v, be a basis of V. Then the elements of 


Im(T) are of the form T(v) with v € V. Since v1, ..., v span V, each v € V can 
be written v = xıvı +... + XnYn with x; € F, and 


Tv) = T(xivi +... + XnYn) = xT (1) +... + XnT On). 
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Thus T(v1),..., T (vn) is a spanning set for Im(T) and 
rank(T) = dim Span(T(),..., 7 (n)). 


Since we have already seen an algorithmic way of computing the span of a finite 
family of vectors (using row-reduction, see the discussion preceding Example 4.30), 
this gives an algorithmic way of computing the rank of a linear transformation. 
More precisely, pick a basis w,,...,W,, of W and express each of the vectors 
T(vı),..., T (vn) as linear combinations of w),...,w»,. Consider the matrix A 
whose rows are the coordinates of T (vı), ..., 7(v,) when expressed in the basis 
W1,---,Wm Of W. Performing elementary operations on the rows of A does not 
change the span of T (v1), ..., T (vn), so that rank(7) is the dimension of the span 
of the rows of A,.¢, then reduced row-echelon form of A. On the other hand, it is 
very easy to compute the last dimension: by definition of the reduced row-echelon 
form, the dimension of the span of the rows of A,ef is precisely the number of 
nonzero rows in A,.¢ or, equivalently, the number of pivots in A,.. Thus 


rank(T) = number of nonzero rows of A,¢¢ = number of pivots in Aref. 


Let us see two concrete examples: 
Problem 5.49. Compute the rank of the linear map T : R? — R‘ defined by 
T(x, y,z) = (x+y +z,x—y,y—z,z— x). 
Solution. We let vı, v2, v3 be the canonical basis of R? and compute 
T(vı) = T(1,0,0) = (1,1,0, —1), 


thus the first row of the matrix A in the previous discussion is (1, 1,0, —1). We do 
the same with v2, v3 and we obtain 


11 0-1 
A=]1-11 0 
10-11 
Using row-reduction we compute 
100 0 
Aref = | 010-1 
001-1 
and we deduce that 
rank(T) = 3. 
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Problem 5.50. Let V be the space of polynomials with real coefficients of degree 


not exceeding 3, and let T : V — V be the linear map defined by 


T(P(X)) = P(X + 1) — P(X). 
Find rank(7)). 


Solution. We start by choosing the canonical basis 1, X,X?, X? of V and 


computing 


TA)=0, T(X)=X4+1-X=$1, TIX?) =(X 41% -X?=142X 


and 
T(X3) = (X + 1} — X? = 1 +3X + 3X’. 
The matrix A in the previous discussion is 


0000 
1000 
1200 
1330 


and row-reduction yields 


1000 
0100 
0010 
0000 


Aref = 


There are three pivots, thus 


rank(T) = 3. 


E 


We turn now to a series of more theoretical exercises, which establish some other 
important properties of the rank of a linear map. In all problems below we assume 


that the vector spaces appearing in the statements are finite dimensional. 


Problem 5.51. Let T : V —> W be a linear map. Prove that 
rank(T) < min(dim V, dim W). 


Solution. Since Im(T) C W, we have rank(T) < dim W. As we have already 
seen, if v1, ... , Vn is a basis of V, then Im(T) is spanned by T (v1), ..., T (vn), thus 


rank(T) < n = dim V. 


The result follows by combining the two inequalities. 
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Problem 5.52. Let Tı : U — V and T, : V — W be linear maps. Prove that 
rank(7> o Ti) < min(rank(7,), rank(7>)). 

Solution. The image of T> o T, is included in that of T>, thus rank(T> o T,) < 

rank(7>). Next, we consider the restriction T; of T» to Im(T,), obtaining a linear 


map T; : Im(T;) —> W whose image clearly equals that of T) o Tı. Applying 
Problem 5.51 to T; we obtain 


rank(7> o Tı) = rank(T;) < dim(Im(7))) = rank(7}), 


and the result follows. E 


Problem 5.53. Let Ti, Tə : V —> W be linear transformations. Prove that 


|rank(7,) — rank(T>)| < rank(T; + T2) < rank(T;) + rank(T>). 


Solution. We have Im(T; + T2) C Im(T;) + Im(7)) and so 
rank(T; + T2) < dim(Im(7)) + Im(T>)) < 


dim Im(T;) + dim Im(7>) = rank(7;) + rank(7), 


establishing the inequality on the right. On the other hand, we clearly have Im(7>) = 
Im(—7>), thus rank(7>) = rank(—7>). Applying what we have already proved, we 
obtain 


rank(7, + T2) + rank(7T>) = rank(T; + T2) + rank(—T>) > rank(T;), 


thus rank(T; + T2) > rank(T;) — rank(7>). We conclude using the symmetry in T; 
and 7). O 


Problem 5.54. Prove that if Sı : U > V,T : V —> W and S3 : W —> Z are linear 
maps such that S1, S2 are bijective, then 


rank(S2T S1) = rank(T). 
Solution. Since S; is bijective, we have 
(TS\)(U) = T(Si(U)) = T(V) = Im(T). 


Since Sj is bijective, it realizes an isomorphism between (7'S;)(U) and 
S»o((T'S,)(U)), thus these two spaces have the same dimension. We conclude 
that 


rank(T) = dimIm(T) = dim(7'S,)(U) = 


= dim S2((TS1)(U)) = dim(S2TS,)(U) = rank(S2T'S)). 


Note that we only used the injectivity of Sı and the surjectivity of $2. 
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We will now prove the first fundamental theorem concerning the rank of a 
linear map: 


Theorem 5.55 (The Rank-Nullity Theorem). Let V,W be vector spaces over a 
field F and let T : V — W be a linear transformation. If V is finite-dimensional, 
then 


dim ker T + rank(T) = dim V. (5.3) 
Proof. Letn = dim V and let r = dimker T. Since ker T is a subspace of V, we 
have r < n, in particular r < oo. We need to prove that dim ImT = n —r. 

Let v,,...,v, be a basis of ker T and extend it to a basis v1, .. . , Vn of V. We will 
prove that T(v,+1),..., T (vn) form a basis of Im(T), which will yield the desired 
result. 

Let us start by proving that T(v;+1),...,7(v,) are linearly independent. 
Suppose that a;+1,...,@, are scalars in F such that 


Gr4iT Orsi) +... + anT (vn) = 0. 


This equality can be written as T (ap+1Vr+1 +... + anYn) = 0, or equivalently 
ar+1Vr+1 +... + nVn € ker T. We can therefore write 


Ar+1Vr+1 + -.. F anVn = bivi kasant b, v, 


for some scalars bj,...,b, € F. But since v,,...,v, form a basis of V, the last 
relation forces a;4; = ... = d, = 0 and bı = ... = b, = 0, proving that 
T (v;+1), ---, T (vn) are linearly independent. 

Next, we prove that T(v,+1),..., T (vn) span Im(T). Let x € Im(T). By 
definition, there is v € V such that x = T (v). Since vı, ..., Vn span V, we can find 
scalars a,,...,d,) € F such that v = avı +... + anVn. Since v1, ..., vr € ker T, 
we obtain 


x =T(v) =) aT) = $ Tvi) € Span(T (v1), T0). 


i=1 i=r+l 


This finishes the proof of the theorem. Oo 


Corollary 5.56. Let V be a finite-dimensional vector space over a field F and 
let T : V — V be a linear transformation. Then the following assertions are 
equivalent: 


a) T is injective. 
b) T is surjective. 
c) T is bijective. 


Proof. Suppose that a) holds. Then the rank-nullity theorem and the fact that 
kerT = 0 yield dimIm(T) = dim V. Since Im(7) is a subspace of the finite- 
dimensional vector space V and dim Im(T) = dim V, we deduce that Im(7) = V 
and so T is surjective, thus b) holds. 
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Suppose now that b) holds, thus dim Im(T) = dim V. The rank-nullity theorem 
yields dimkerT = 0, thus kerT = 0 and then T is injective. Since it is also 
surjective by assumption, it follows that c) holds. Since c) clearly implies a), the 
result follows. E 


Remark 5.57. Without the assumption that V is finite dimensional, the previous 
result no longer holds: one can find linear transformations T : V — V which are 
injective and not surjective, and linear transformations which is surjective and not 
injective. Indeed, let V be the space of all sequences (x»)n>0 of real numbers and 
define two maps 7), To : V — V by 


Ti (xo, X1,---) = (x1, X23...)  Ta(xX0o, X1,---) = (0, Xo, X1,- -). 


Then T; is surjective but not injective, and 7) is injective but not surjective. 


Problem 5.58. Let A and B be n xn matrices such that A B is invertible. Show that 
both A and B are invertible. 


Solution. Let T, : F” — F” and T : F” — F” be the linear maps associated 
with A and B respectively (so Tı(X) = AX and 7,(X) = BX). Then AB is the 
matrix of the linear map T; o T) with respect to the canonical basis of F” (both 
on the source and on the target). Since AB is invertible, we deduce that Ti o T is 
bijective, hence Th is injective and T; is surjective. But an injective or surjective 
linear transformation on a finite dimensional vector space is automatically bijective. 
Thus 7; and T; are both bijective and the result follows from Problem 5.37. Oo 


Problem 5.59. Let A, B € M,(C) satisfy AB = I„. Prove that BA = I. 


Solution. By the previous problem, A and B are invertible. Multiplying the relation 
AB = I, on the right by A`! yields B = A~!. Thus BA = A7!A = I. Oo 


Problem 5.60. Show that if A and B are square matrices in M,(C) with AB = 
A + B,then AB = BA. 


Solution. The condition AB = A + B implies (A — [,,)(B — In) = In. Therefore 
A -— Í, and B — I,, are mutually inverse and (B — I,,)(A — In) = In, which implies 
BA=A+B=AB. o 


Problem 5.61. Let T : R? > R? be the linear transformation defined by 
T(x, y,z) = (x — y,2x — y — z, x — 2y + 2). 


Find the kernel of T and the rank of T. 


Solution. In order to find the kernel of T, we need to find those x, y,z € R? such 
that 


x-y=0, 2x-y-z=0, x-2y+z=0. 
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The first equation gives x = y, the second one z = x and so x = y = z, which 
satisfies all equations. It follows that the kernel of T is the subspace {(x, x, x)|x € 
R3}, which is precisely the line spanned by the vector (1, 1, 1). 

Next, then rank of T can be determined from the rank-nullity theorem: 


3 = dim R? = dimker T + rank(T) = 1 + rank(T), 


thus rank(T) = 2. Oo 
We turn now to the analogous concept for matrices 


Definition 5.62. Let A E€ Mm.n(F). The rank of A is the integer rank(A) defined 
as the rank of the linear map F” —> F” sending X to AX (i.e., the canonical linear 
map attached to A). 


Remark 5.63. We can restate the results established in Problems 5.51, 5.52, 5.53, 
and 5.54 in terms of matrices as follows: 


a) rank(A) < min(m,n) if A E Mmn(F). 

b) |rank(A) — rank(B)| < rank(A + B) < rank(A) + rank(B) for all A,B € 
Mmn(F). 

c) rank(PAQ) = rank(A) for all P € GLm (F), A € Mm n(F) and Q € GL, (F). 
That is, the rank of a matrix does not change if we multiply it (on the left or 
on the right) by invertible matrices. 

d) rank(AB) < min(rank(A), rank(B)) for A € Mm n(F) and B € M, (F). 


Of course, we can also make the definition very concrete: let A € M,,,(F) and 
let €},@2,...,@, be the canonical basis of F”. Write g : F” — F for the linear 
map X — AX canonically attached to A. By the previous discussion Im(¢@) is the 
span of g(e1),..., (en). Now, if Cj,...,C, are the columns of A, seen as column 
vectors in F”, then by definition g(e;) = C; for all i. We conclude that the image 
of ¢ is the span of Cy,..., Ch. 

Let us summarize the previous discussion in an important 


Theorem 5.64. Let A E Mmn(F) and let C1, C2,..., Cp € F” be its columns. 
Then 


rank(A) = dim Span(C), C2,..., Cn). 


So, following the previous discussion, we obtain the following algorithm for 
computing the rank of A: consider the transpose ‘A of A (thus the columns of 
A become rows in the new matrix) and bring it to its reduced row-echelon form. 
Then count the number of nonzero rows or equivalently the number of pivots. 
This is the rank of A. We will see later on (see Problem 5.70) that the trick of 
considering the transpose of A is actually not necessary: A and ‘A have the same 
rank. Of course, we can also avoid considering the transpose matrix and instead 
using column operations on A. 
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Problem 5.65. Compute the rank of the matrix 
0 1 
1 0 


—1 —1 
1.2 


| 
ean 
= 


Solution. Following the previous discussion we bring the matrix 


-12 0 1 1 
TARERE 
0 1—1 1 -1 
1 0—-1—2 1 


to its reduced row-echelon form by row-reduction 


1000 0 
0100 1 
0010 1 
0001-1 


CA)ref = 


Since there are 4 nonzero rows, we deduce that 


rank(A) = 4. g 


Problem 5.66 (Sylvester’s Inequality). Prove that for all A, B € M,(F) we have 
rank(AB) > rank(A) + rank(B) — n. 


Solution. Consider V = F” and the linear transformations Ti, Do : V > V 
sending X to AX, respectively BX. We need to prove that 


rank(Tı o T2) > rank(T,) + rank(7>) — dim V. 
By the rank-nullity theorem we know that 
rank(7T,) — dim V = — dim ker T}, 
thus it suffices to prove that 


rank(7) — rank(T; o T2) < dim ker T}. 
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Let W = T(V) = Im(T>) and let T/ : W — V be the restriction of Tı to W. Then 
using again the rank-nullity theorem, we obtain 


rank(T\ o T2) = dim 7\(W) = rank(T/) 
= dim W — dimker Tj. 
Now dim W = rank(72), so we are reduced to proving that 


dim ker T/ < dimker T}. 


This is clear, as ker TI = ker T AW C kerT}. Oo 


Problem 5.67. Let A € M3.(R) and B € M23(R) be matrices such that 


0-1-1 
AB = | —1 0 -1 
1 1 2 


a) Check that (AB)? = AB and that AB has rank 2. 
b) Prove that BA is invertible. 
c) Prove that (BA)? = (BA)? and deduce that BA = Ip. 


Solution. a) One checks using the product rule that the matrix 
0-1-1 


X = | —1 0 -1 
1 1 2 


satisfies X? = X. Next, one computes the rank of X by computing the reduced 
row-echelon form of 'X: 


10-1 
CX)ref =| 01-1 
00 0 


Since there are two pivots, AB = X has rank 2. 
b) Using Remark 5.63, we obtain 


rank(BA) > rank(A(BA)B) = rank((AB)?) = rank(AB) = 2. 


On the other hand, BA is a 2 x 2 matrix, thus necessarily rank(BA) = 2 and so 
BA is invertible. 
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c) Since (AB)? = AB, we have 
B(AB} A = B(AB)A = (BA)’. 


The left-hand side equals (BA)? and so (BA)? = (BA)?. Since BA is invertible, 
it follows that BA = J, and the problem is solved. Oo 


The second fundamental theorem concerning rank is the following: 


Theorem 5.68. Let A E Mmn(F) and let0 < r < min(m,n). Then rank(A) = 
r if and only if there are matrices P € GLy(F) and Q € GL,(F) such that 
A = PJ, Q, where 


Proof. If A = PJ, Q, then by part c) of Remark 5.63 we have rank(A) = rank(J;). 
The linear map associated with J, is (x1,..., Xn) —> (%1,...,X;), and its image 
is F”, which has dimension r, thus rank(J,) = r. This proves one implication. 

Assume now that rank(A) = r and let T : F” —> F” be the linear map sending 
X to AX, so that A is the matrix of T with respect to the canonical bases of F” 
and F”. Thanks to Theorem 5.43, it suffices to prove that we can find two bases 
Bı, B2 of F”, F™ respectively such that the matrix of T with respect to B1, By 
is J,. In order to construct Bı and B2, we start with a basis e),...,@,—, of ker T 
(note that dim ker T = n — r by the rank-nullity theorem) and we complete it to a 
basis e1, ..., €n of F”. Let fi = T (en-r+i) for 1 < i < r. We claim that f\,..., fr 
is a basis of Im(T). Since dim Im(T) = r, it suffices to see that fi,..., f- span 
Im(T). But any x € Im(T) can be written x = T (aiei + ... + anen) and since 
T(e;) =Oforl < j <n-—r, we have 


X = anri fit... taf, € Span( fi... fe), 


proving the claim (this argument has already been used in the last paragraph of the 
proof of the rank-nullity theorem). 

Complete now fi,..., fp to a basis fi,..., Jm of F”. Cal By = 
(€n=r+1;---,ĉn,ĉ1,.--,€r) and By = (fi,..., fm). Then by construction the 
matrix of T with respect to B1, Bz is J, and the theorem is proved. Oo 


Corollary 5.69. Let A,B € Mmn(F). Then rank(A) = rank(B) if and only if 
there are matrices P € GLi,(F) and Q € GL, (F) such that B = PAQ. 


Proof. If B = PAQ with P,Q invertible, then the result follows from part 
c) of Remark 5.63. Assume that rank(A) = rank(B) = r, then by the previous 
theorem we can write A = P,J,Q, and B = P»J,Q> for invertible matrices 
Pı, Po, Q1, Q2. Setting P = P,P; ! and Q = Q7'Q> we obtain B= PAQ. O 
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Problem 5.70. Prove that for all A E€ Mm n(F) we have 


rank(A) = rank(‘ A). 


Solution. Say A has rank r and write A = PJ,Q with P € GL,,(F) 
and Q € GL,(F). Then ʻA = 'Q'J,'P and since 'P,‘@Q are invertible, 
we have rank(‘A) = rank(‘J,). Since ‘J, = J,, we conclude that 
rank(‘A) = rank(A) = r. Oo 


Problem 5.71. Let A € M,,(C). Find, as a function of A, the smallest integer r 
such that A can be written as a sum of r matrices of rank 1. 


Solution. For all matrices X, Y € M,,(C) we have 
rank(X + Y) < rank(X) + rank(Y) 


thus if A = A; +---+ A, with rank(A;) = 1, then 


rank(A) = rank bs a) < X rank(4;) = 
i=l 


i=l 


We will prove that we can write A as a sum of rank(A) matrices of rank 1, which 
will imply that the answer of the problem is rank(A). Indeed, if A has rank k, write 
A = PJ,R for some P,R € GL, (C). Thus A = A; + Ar +++: + Ax, where 
A; = PE;;Q and E;; is the matrix having all entries 0 except for entry (i, i), which 
is 1. Clearly A; has rank 1 (since P, Q are invertible and £;; has rank 1). Oo 


Problem 5.72. Let A € M,(F) have rank r € [1,n — 1]. Prove that there exist 
BeM,,(F), C € M,„ (F) with 


rank(B) = rank(C) = r, 
such that A = BC. 


Solution. Write A = PJ,Q, where P, Q are invertible n x n matrices. Note that 


choosing B; = «| € Mn (F) and C; = le 0] € M,n(F) we have J, = B,C; 


and B,, C; both have rank r. But then 
xA = PJ,O = (PB\)(C\Q) 


and B = PB, € M,,,(F), C = C1Q € M,n(F) both have rank r, since P, Q are 
invertible (Remark 5.63). The problem is solved. Oo 


Problem 5.73. Let A = [a;;] € M,(C) be a matrix of rank 1. Prove that there exist 
complex numbers x1, X2,..-,Xn, Y1, Y2, - - -, Yn Such that aj; = x; y; for all integers 
1<i,j <n. 


194 5 Linear Transformations 


Solution. According to the previous problem there exist two matrices 
B € Maa(C) , CEM, (C) 


so that A = BC. If 


x) 
B=-|~ , C= (y1 y2... Yn), 
Xn 
then 
xy X1V1 X12 ... Xin 
a- ® E E E E X2V1 X2Y2 ... X2Yn 
Xn Xn Y1 XnyY2 ~+- XnYn 


5.4.1 Problems for practice 


1. a) Find the rank of the linear transformation 
T: ROR’, T(x, y,.2=(—y,y—z,7Z—-%). 


b) Answer the same question with R replaced with F). 
2. Let T be the linear transformation on R* whose matrix with respect to the 
canonical basis is 


12 1 
A=]01-1 
11 1 


Find a basis of Im(T) and ker(T), and compute the rank of T. 
3. Compute the rank of the matrices 


011 3 

11 1 —2 1111 
A=|01-3 4 |, B=]211-4 
22 2 —4 222 2 


322-3 


5.4 


fon 


10. 


11. 


12. 
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. Let A, B e M3(F) be two matrices such that AB = O3. Prove that 


min(rank(A), rank(B)) < 1. 


. Let A € M3(C) be a matrix such that A? = O3. 


a) Prove that A has rank 0 or 1. 
b) Deduce the general form of all matrices A € M3(C) such that A? = 03. 


. Find the rank of the matrix A = [cos(i — j ))i<i,j<n- 
. a) Let V be an n-dimensional vector space over F and let T : V > V bea 


linear transformation. Let T/ be the j-fold iterate of T (so T? = T oT, 
T? = T oT oT, etc). Prove that 


Im(T”) = Im(T"T'). 
Hint: check that if Im(7/) = Im(T/t!) for some j, then Im(7T*) = 


Im(T*t!) fork > j. 
b) Let A € M,,(C) be a matrix. Prove that A” and A”*! have the same rank. 


. Let A E€ M,(F) be a matrix of rank 1. Prove that 


A? = Tr(A)A. 


. Let A E€ M,,(F) and B € M,,(F). Prove that 


rank k J = rank(A) + rank(B). 
Prove that for any matrices A € M,,,(F) and B € Mm(F) we have 
rank| o d = n + rank(B). 


Letn > 2 and let A = [a;;] € M, (C) be a matrix of rank 2. Prove the existence 
of real numbers x;, Yi, Zi, ti for 1 < i < n such that for all i, j € {1,2,...,7} 
we have 


aij = Xi Vj + lj. 


Let A = (Gif) etd ,B= (bi) <i,j <n be complex matrices such that 


i-j 
đij =? J bij 


for all integers 1 <i, j < n. Prove that rank A = rank B. 
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13. 


14. 


15. 


16. 


17. 


18. 
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Let A € M,(C) be a matrix such that A? = A, i.e., A is the matrix of a 
projection. Prove that 


rank(A) + rank(/,, — A) = n. 


Letn > k and let A;,..., Ax E M,(R) be matrices of rank n — 1. Prove 
that A1 A2... Ák is nonzero. Hint: using Sylvester’s inequality prove that 
rank(A,...4j;)>n—j forl <j <k. 

Let A € M,(C) be a matrix of rank at least n — 1. Prove that rank(A*) > n — k 
for 1 < k <n. Hint: use Sylvester’s inequality. 

a) Prove that for any matrix A € M,,(R) we have 


rank(A) = rank(’ AA). 


Hint: if X € R” is a column vector such that ‘AAX = 0, write ‘X 'AAX = 
0 and express the left-hand side as a sum of squares. 


b) Let A = | l : 1 I Find the rank of A and ‘AA and conclude that part a) of 
l — 


the problem is no longer true if R is replaced with C. 
Let A be an m x n matrix with rank r. Prove that there is an m x m matrix B 
with rank m — r such that BA = Oy». 
(Generalized inverses) Let A € Mm, n(F). A generalized inverse of A is a matrix 
X € Mn m(F) such that AXA = A. 


a) If m = n and A is invertible, show that the only generalized inverse of A 
is A. 

b) Show that a generalized inverse of A always exists. 

c) Give an example to show that the generalized inverse need not be unique. 


Chapter 6 
Duality 


Abstract After an in-depth study of duality for finite dimensional vector spaces, 
we prove Jordan’s classification result of nilpotent transformations on a finite 
dimensional vector space. We also explain how to describe vector subspaces by 
equations using hyperplanes. 


Keywords Duality * Dual basis ° Linear form ° Hyperplane ° Orthogonal 


This chapter focuses on a restricted class of linear maps between vector spaces, 
namely linear maps between a vector space and the field of scalars (seen as a vector 
space of dimension 1 over itself). Such linear maps are called linear forms on the 
vector space. Even though the whole chapter might look rather formal at first sight, 
the study of linear forms (known as duality) on finite dimensional vector spaces 
is very important and yields a lot of surprising properties. For instance, we will 
use duality to prove a famous result due to Jordan which completely classifies the 
nilpotent linear transformations on a finite dimensional vector space. This is one 
of the most important results in linear algebra! We will also use duality in the last 
chapter, in a more geometric context. 


6.1 The Dual Basis 


We fix a field F in the sequel. The reader may take F € {R, C} if he/she prefers. 


Definition 6.1. The dual V* of a vector space V over F is the set of linear maps 
l: V — F, endowed with the structure of a vector space over F by defining 


+h)v) =hv) +h) and (cl)(v) = clv) 


for h, h,l € V*, v, v,ve V andc eF. 


We leave to the reader the immediate verification of axioms of a vector space, 
which show that V* is indeed a vector space over F when endowed with the 
previous operations. An element / of V* is called a linear form on V. These objects 
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are not very mysterious: assume for instance that V = F” and let e;,...,e, be the 
canonical basis. Then for all (x1, ..., Xn) E€ V we have 


(x1, .-., Xn) = (x11 +.. .+Xnen) = xıl (e,) +... + Xn] (en) = aixi +... +anXn, 
where a; = /(e;) € F. Conversely, any map of the form (x1,...,X%,) a,x, + 
... + AnXn is a linear form on R”. In general, if V is a finite dimensional vector 


space and e1, ..., €n is a basis of V, then the linear forms on V are precisely those 
maps l : V — F of the form 


[(xyey +... + Xnen) = 41X1 +... + anXn 


with a),...,d, E F. 
By definition we have a canonical map 


V*xV >F, (,v)e lv). 


We also denote this map as (/, v) > (L, v} and call it the canonical pairing between 
V and its dual. Unwinding definitions, we obtain the useful formulae 


(ch +h, v) = c(l, v} + (h,v), and (l, cvi +v) = c(l, vi) + (L, v2). 


The canonical pairing is a key example of a bilinear form, a topic which will be 
studied in much greater depth in subsequent chapters. 
Each vector v € V gives rise to a natural linear form 


evy: V* > F, lelo) 

on V*, obtained by evaluating linear forms at v. We obtain therefore a map 
1: V > V**, 0) = ev, 

called the canonical biduality map. Note that by definition 


(0), 1) = (Lv) 


for all linear forms / on V and all vectors v € V. A fundamental property of the 
biduality map is that it is always injective. In other words, if v is a nonzero vector 
in V, then we can always find a linear form / on V such that /(v) 4 0. The proof 
of this rather innocent-looking statement uses the existence of bases for general 
vector spaces, so we prefer to take the following theorem for granted: we will see in 
short time that the biduality map is an isomorphism when V is finite dimensional, 
with an easy proof, and this is all we will need in this book. 
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Theorem 6.2. For any vector space V over F, the canonical biduality map ı : 
V — V** is injective. 


Before moving on, let us introduce a useful and classical notation, called the 
Kronecker symbol: 


Definition 6.3. If i, j are integers, we let ô&;; = 1 ifi = j and 4;; = Oifi Æ j. 


Let us assume now that V is finite dimensional, of dimension n > 1 and let 
us consider a basis e1, €2,...,€n of V. If v is a vector in V, then we can write 
v = xe; +... + Xnen for some scalars x1, ..., Xn which are uniquely determined. 
Define the ith coordinate form by 


e: V—>F, ef v=x if v= xe +... + Xen. 
Thus by definition for all v € V we have 
v= ye e; (v)e;, 
i=l 
or equivalently 


v= X (et, vei. 
i=l 
Note that for all 1 < i, j < n we have 


e;(e;) = bij. 


We are now ready to state and prove the first fundamental result of this chapter: 


Theorem 6.4. Let V be a vector space of dimension n > 1 and let e},..., en bea 
basis of V. Then the coordinate forms ef , .. . , ež forma basis of V* as vector space 
over F. 


Proof. Let us check first that e* is an element of V*, i.e., that e* is linear. But if 
x = xe; +... + Xnen and y = yey +... + Ynen, andifc € F is a scalar, then 


x +cy = (xı + cyi)ei +... + (Xn + CYn)en, 
thus 
e7 (x + cy) = x; + cy; = ef (x) + cef (y), 


so e¥ is linear. 
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Next, let us prove that eř,..., e% 


Cj,...,Cn © F are scalars such that 


are linearly independent. Suppose that 


ce; +... + Cpe, = 0. 
Evaluating at e; yields 
ci(e¥, ei) +... + cn (e7,e;) = 0. 
The left-hand side equals 
> cjlež, ei) — wa = Cj. 
j=l j=l 
Thus c; = 0 for all i and so ey ,...,@, are linearly independent. 
Finally, let us prove that e*,...,e* are a generating family for V*. Let l € V* 


be an arbitrary linear form. If v = x;e; +... + Xnen is a vector in V, then linearity 
of / gives 


(l, v} = x1(1,e1) +... + Xn (1, en) = (l, ei) (ef, v} +... + (Len) (ež, v) 
= ((l, eije + (l, e2)ež +... + (Lender, v), 
showing that 


l = (l eije + (l, e2)ež +... + (l, enjež. 


Thus / belongs to the span of e7, ..., ež, which finishes the proof of the theorem. C 
Remark 6.5. The proof shows that for any / € V* we have the useful relation 

l = (l, eije + (l, e2)ež +... + (l, en}ež. 
This is the “dual” relation of the tautological relation 


n 
v= X (ef, ver, 


i=l 
valid for all v € V. 


The previous theorem explains the following: 


Definition 6.6. If ei,...,e€n is a basis of a vector space V over F, we call 
ey,...,@, the dual basis of e1,...,é,. It is uniquely characterized by the prop- 
erty that 


eš (ej) = 6; for all 1 <i,j <n. 


A crucial consequence of the previous theorem is the following: 
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Corollary 6.7. For all finite dimensional vector spaces V over F we have 

dim V = dimV*. 
Moreover, the canonical biduality map ı : V —> V** is an isomorphism of vector 
spaces over F. 


Proof. The first part is clear from the previous theorem and the fact that all bases 
in a finite dimensional vector space have the same number of elements, namely the 
dimension of the space. To prove that ı is an isomorphism, it suffices to prove that 1 
is injective, since 


dim V = dim V* = dim V**, 


as follows from what we have already proved. 
So suppose that ı(v) = 0, which means that (/,v) = 0 for all / € V*. Let 


€1,...,, bea basis of V. Then (e*, v} = 0 for all 1 < i < n, and since 
v= X (ef, vei, 
i=l 
we obtain v = 0, establishing therefore the injectivity of ı. Oo 


Remark 6.8. Conversely, one can prove that if the biduality map is an isomorphism, 
then V is finite dimensional. In other words, the biduality map is never an 
isomorphism for an infinite dimensional vector space! 


Recall that R,,[X] is the vector space of polynomials with real coefficients 
whose degree does not exceed n. 


Problem 6.9. Let V = R,[X]. It is easy to see that the maps P > P®(0) 
(where P is the ith derivative of P) are elements of V*. Express the dual basis of 
1,X,..., X” in terms of these maps. 


Solution. Let e; = XÍ € V and let eř,...,ež be the dual basis. By definition 
e; (e;) = 6;;. Thus for all P = ao +a) X +...+a,X" € V we have 


Dz 
ef (P) = a; = — POO). 
ps 


Thus e* is the linear form given by P > 4PC )(0). Oo 


The following problem gives a beautiful and classical application of the ideas 
developed so far: 


Problem 6.10 (Lagrange Interpolation). Let V = R,,[X] and let xo,..., x, be 
pairwise distinct real numbers. For 0 <i < n define 
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a) Show that 
Li(xj)=46,; forall 1<i,j <n. 


b) Prove that Lo,..., L, form a basis of V. 
c) Describe the dual basis of Lo,..., Ln. 
d) Prove Lagrange’s interpolation formula: for all P € V we have 


P= So P(xi)Li. 
i=0 


e) Prove that for any bo,...,b, € R we can find a unique polynomial P € V 
with P(x;) = b; for 0 < i < n. This polynomial P is called the Lagrange 
interpolation polynomial associated with bo,..., by. 


Solution. a) By construction we have L;(x;) = 0 for j # i. On the other hand, 


thus 
Lj (xj) = ĉj. 
b) Since dim V = n + 1 (a basis being given by 1, X,..., X”), it suffices to check 


that Lo,..., Ln are linearly independent. Suppose that aoLo +... + &nLn = 0 
for some scalars ao, ...,@,. Evaluating this equality at x; and using part a) yields 


0= Xaj Lj&) — X aj; = di 
j=0 j=0 


forall 0 < i < n, thus Lo,..., L, are linearly independent. 
c) By definition of the dual basis and by part a), we have 


L} (£3) = bij = 8 ji = Lj (xi) 


for alli, j. Fix i € {0,...,n}. Since L*(L;) = L;(x;) for all 0 < j < n and 
since Lo,..., Ln span V, we deduce that 


L}(P) = P(x;) forall P €V. 


d) By definition of the dual basis 


By part c) we have (L¥, P} = P(x;), which yields the desired result. 
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e) It suffices to take P = )~/_,b;L; in order to prove the existence part. For 
uniqueness, if Q € V also satisfies O(x;) = b; for 0 <i < n, it follows that 
P — Q isa polynomial whose degree does not exceed n and which has at least 
n + | distinct roots, thus P — Q = 0 and P = Q. 
E 


Problem 6.11. Let xo,...,Xn € [0, 1] be pairwise distinct and let V = R,[X]. 
a) Prove that the map/ : V — R defined by 


I(P) = [ P(x)dx 


is a linear form on V. 
b) Using the previous problem, prove that there is a unique n + 1-tuple (do, ..., an) 
of real numbers such that 


f P(x)dx = oc: 
0 i=0 


forall P € V. 


Solution. a) This is a direct consequence of basic properties of integral calculus. 

b) We use the result and notations of the previous problem, which establishes that 
LŠ, ..., L% is a basis of V*, and L¥(P) = P(x;) for all P € V. Thus saying 
that 


f P(x)dx = we. 
g i=0 


for all P € V is equivalent to saying that 


I(P) = aiL] (P) 


i=0 


for all P € V, in other words 


l= yal? 
i=0 


as elements of V*. Since Lj,...,L7 is a basis of V*, the existence and 
uniqueness of aọ, . . . , An is clear. O 
Let us consider now the following practical problem: given a basis v1,..., Vn 
of R”, express the dual basis v},..., v* in terms of the dual basis ef,...,e7 of the 


canonical basis of R”. To do so, write 
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Note that in practice we have an easy access to the matrix A = [a;;]: its columns are 
precisely the coordinates of v1, . . . , vn with respect to the canonical basis e€;,..., €n 
of R”. We are interested in finding B = [b;;]. Using the identity v*(v;) = 4;;, we 
obtain 


bj = vhs) = Do briek vj) = Do bru) ajek (en) 
k=1 1=1 


k=1 


= XO 9 arbi 8x1 = X ak; bei = ('B- A)ij. 
k=l 


k=1 /=1 
Since this holds for alli, j, we deduce that 
'B-A=l,, ie, BS Ay, 


Thus in practice we need to compute A~! (via row-reduction on the matrix (A|/J,,)) 
and take the transpose! 


Problem 6.12. Let eř, ež, e} be the dual basis of the canonical basis of R°. Express 
in terms of ef, ež, ež the dual basis of the basis of R? consisting in 


—3 —1 0 
v = 2 > y= , v3=]—2 
1 3 


Solution. We leave to the reader to check that v1, v2, v3 form a basis of R?, using 
row-reduction on the matrix A whose columns are vı, v2, v3. Using row-reduction 
on (A|J3), one obtains 


—5 —3 —2 
A'=-| 8 9 6 
—-1—2 1 
With the above notations 
—5 8 —1 
B= —3 9 —2 
—26 1 
and then 
v = —-e, — -e6 — =e3, 
1 17 782 ~ 503 
vV = 741 + 72 + 73? 
3 = ae a = 72 F 783: 
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Consider now the following inverse problem: given a basis /1,..., fa of V*, 
is there always a basis ¢),...,¢, of V whose dual basis is fi,..., fn? If so, how 
to find such a basis? 

Let us start with any basis v1, ...,vn of V (we know that dim V = n since we 
know that dim V* = n). Of course, in practice the choice of v1, ..., Vn will be the 
natural one (for instance if V = R” then we will take for vj,...,v, the canonical 


basis, if V = R,_\[X], we will take for vı, ..., vn the basis 1, X,..., X"~', etc). 
Define a matrix 


A= [aij], aj = fi(v;). 


This will be known in practice. On the other hand, we are looking for a basis 
€1,...,€, of V such that e* = fj, that is 


Fi (er) = bij 


for 1 < i,j < n. We are therefore looking for an invertible matrix B such that 
setting 


n 
ei = J bjivj, 
j=1 


these vectors satisfy the previous relations. Well, these relations are equivalent to 


8 = filej) = J by ie) = J brjaik = (AB);j, 
k=1 k=1 


that is 

AB =h. 
In other words, e),...,é, exist if and only if the matrix A is invertible, and then 
€1,...,@, are uniquely determined by 

B= A`. 


It is however not clear that the matrix A is invertible. This is however the case, as 
the following theorem shows: 


Theorem 6.13. Let v1,...,V, be a basis of V and let fi,..., fa be a basis of V*. 
The matrix A = [a;j] with ai; = fi(v;) is invertible. Consequently (thanks to the 
above discussion) for any basis f\,..., fa of V* there is a unique basis e1,... , en 
of V whose dual basis is fi,..., fn. 
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Proof. Assume that A is not invertible. We can thus find a nonzero vector X € F” 
with coordinates x1, ...,X„ such that AX = 0. Thus for all j € {1,2,...,n} we 
have 


0= J ajx =>) fied = Ox). 
i=l 


i=1 i=1 


The vector v = xıvı +...+X,V, is therefore nonzero (since vj,..., Vv, are linearly 
independent and X 4 0) and we have 
fi) =...= fho) = 0. 


Since fi,..., fn is a spanning set for V*, we deduce that /(v) = 0 for all / € V*. 
Thus v is a nonzero vector in the kernel of the biduality map V > V**, which was 
however shown to be an isomorphism. This contradiction shows that A is invertible 
and finishes the proof. Oo 


In practice, it is helpful to know that a converse of the theorem holds: 


Theorem 6.14. Let V be a vector space of dimension n over a field F. If the matrix 
A = [aj] with ai; = fi(v;) is invertible for some v1,...,Vn € V and some 
fi,---> fa E V%, then vi,...,vn form a basis of V and f\,..., fy form a basis 
of V*. 


Proof. Suppose that v;,..., v, are linearly independent, say xıvı +...+2X%,Vv, = 0 
for some X1,...,X, E€ F, not all equal to 0. Applying f; to this relation, we obtain 


0= Fy avi +... + XnVn) = aj1X1 +... + ajnXn 


for all j € {1,2,...,n}, thus AX = 0, where X € F” has coordinates x1,...,Xy; 


contradicting that A is invertible. Thus v;,...,v,, are linearly independent and since 
dim V = n, they form a basis of V. 

Similarly, if fi,..., fa were linearly dependent, we could find a nontrivial 
dependency relation x; fi +... + Xn Ja = 0, which evaluated at each v; would 
yield 


n 
X aijx; = 0, 


i=l 


that is ‘AX = 0 and ‘A would not be invertible, a contradiction. Oo 


Problem 6.15. Consider the following linear forms on R°: 
(x,y,z) =x+2y4+3z, h(x, y,z) =2x+3y+z, h(x, y,z) = 3x+y42z. 


a) Prove that J), l2, 1; form a basis of the dual of R?. 
b) Find the basis of R? whose dual basis is /1, Jy, l. 
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Solution. a) Consider the canonical basis e1, e2, e3 of R? and the matrix 


123 
A= [li(ej)] = 23371 
312 


This matrix is invertible, as one easily shows using row-reduction. It follows from 
the previous theorem that /,, l2, l3 form a basis of the dual of R°. 
b) We compute the inverse of A using row-reduction. We obtain 


—-5 1 7 


1 
AT! 5 


18 7-5 1 
Using the previous discussion, we read the desired basis v, v2, v3 of R3 on the 


columns of A7!: 


—5 1 7 


Problem 6.16. Let V = R2[X] and, for P € V, set 


1 
i(P)= PQ), b(P)=P'(), b(P)= if P(x)dx. 
0 


a) Prove that /;,/2,/; is a basis of V*. 
b) Find a basis e1, e2, e3 of V whose dual basis is 44, l2, /3. 


Solution. a) It is not difficult to check that /;,/2,/; are linear forms on V. In order 
to prove that they form a basis of V*, we will use the previous theorem. Namely, 
we consider the canonical basis v; = 1, v = X and v, = X? of V and the 
matrix 


A= [li(v;)].- 


Noting that if P = aX? + bX + c then 


L(P)=a+b+c, b(P)=2a+b, I(P) = S42 +6, 


we deduce that 
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Noe 


11 
A-|/01 
1 

l3 


wile 


One easily checks using row-reduction that A is invertible and by the previous 
theorem /,, l, l form a basis of V*. 

b) Using the method discussed before Theorem 6.13, we see that we have to 
compute the matrix B = A7!. Row-reduction yields 


-2 į 3 
B=A'=] 6 2-6 
—3 32 3 


2 


Moreover, using that method we deduce that 
e1 = —2v1 + 6v2 — 3v3 = —2 + 6X — 3X’, 


1 3 1 3 
e, = Jit ns 572X + 5X, 


e3 = 3v; — 6v2 + 3v3 = 3— 6X + 3X’. 


6.1.1 Problems for Practice 


In the following problems we let R,,[X] be the space of polynomials with real 
coefficients whose degree does not exceed n. 


1. Find the dual basis of the basis of R? consisting of 


Y= (1,-1,0), H= (0,0, 1), y3 = (1,1,1). 
2. Consider the linear forms on R? 
(x,y,z) =2x +4y +z, h(x, y,z) =4x+2y+3z, h(x, y,z)=x+y. 
a) Prove that /,, l, l form a basis of the dual of R?. 


b) Find the basis of R? whose dual basis is /,, l, l3. 


3. Let V be a finite dimensional vector space over a field F. Prove that for all 
x Æ y € V we can find a linear form / on V such that /(x) 4 L(y). 
4. Define Po = 1 and, for k > 1, 


P(X) = X(X -1)...(X -k + 1). 
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Also, let fy; : R,[X] — R be the map defined by f,(P) = P(k). 


a) Prove that Po,..., P, is a basis of R,[X]. 

b) Prove that fo,..., fn is a basis of R,,[X]*. 

c) Let (P, ..., P*) be the dual basis of (Po,..., Pa). Express P,* in terms of 
Li poean Sar 


5. Leta Æ b be real numbers and for k € {0, 1, 2} set 
P(X) = (X - a} (X — by. 


a) Prove that Po, Pı, P2 form a basis of R2[X]. 
b) Let c = uo and, fora € {a,b,c}, let fy : R2[X] — R be the map defined 
by fa(P) = P(a@). Prove that fa, fo, fe form a basis of Ro[X]*. 


c) Express the dual basis P,*, Př, P in terms of the basis fa, fo, fe- 
6. Fori > O let f; : Ro[X] — R be the map defined by 


1 
s=] x P(x)dx. 


a) Prove that fo, fi, f2 form a basis of Ro[X]*. 
b) Find a basis of R2 [X] whose dual basis is fo, fi, f2- 


7. Let V be the vector space of all sequences (x,,)n>0 of real numbers such that 
Xn+2 = Xn+1 + Xn 


forall n > 0. 


a) Prove that V has dimension 2. 
b) Let lo, lı : V — R be the linear forms sending a sequence (xn)n>0 to Xo, 
respectively xı. Find the basis eo, e; of V whose dual basis is lọ, l1. 


8. Let X be a finite set and let V be the space of all maps g : X — F. For each 
x € X, consider the map /, : V —> F sending f to f(x). Prove that the family 
(lx)xex is a basis of V*. 

9. Let / be a linear form on R, [X] and let k € [0, n] be an integer. Prove that the 
following statements are equivalent: 


a) We have /(X* P) = 0 for all polynomials P € R,—4[X]. 
b) There are real numbers go, ..., @—; such that for all P € R,[X] 


k-1 
I(P) = Seo POO): 
i=0 
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10. a) Let ao,..., a, be pairwise distinct real numbers. Prove that there is a unique 
n + 1-tuple of real numbers (bo,..., 5,) such that for any P € R,[X] we 
have 


P0) + P'O) = } ` bP (a). 
k=0 


b) Find such numbers bọ, ..., b, for n = 2, aọ = 1, a, = 2 and a; = 3. 
11. Prove Simpson’s formula: for all P € R2[X] 


b _ 
i, PG@)dx= s (ro +4P (=) y so) l 


a 


12. a) Let l1, l2 be nonzero linear forms on some nonzero vector space V over R. 
Prove that we can find v € V such that lı (v)l2(v) is nonzero. 
b) Generalize this to any finite number of nonzero linear forms. 
13. Let V, W be vector spaces. Prove that (V x W)* is isomorphic to V* x W*. 


6.2 Orthogonality and Equations for Subspaces 
Let V be a vector space over a field F, let / be a linear form on V and v € V. We say 
that / and v are orthogonal if 


(l,v}=0, ie. I(v)=0, orequivalently v€ kerl. 


If S is any subset of V, we let 
St={lev*| (l,v}=0 Yves} 


be the orthogonal of S. These are the linear forms on V which vanish on S, or 
equivalently on the span of S (by linearity). Thus 


SŁ = Span(S)+. 


Note that S+ is a subspace of V*, since if /; and l, vanish on S, then so does 
1, + cl, for all scalars c € F. 
Similarly, if S is a subset of V*, we let 


St=Wwev| Uv)=0 Ve S} 


be the orthogonal of S. The elements of S+ are the vectors killed by all linear 
forms in S, thus 


St= (\ker/. 


les 
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This makes it clear that S+ is a subspace of V, as intersection of the subspaces 
(ker/);es of V. Again, by linearity we have 


S+ = (Span(S))+ 


for all S c V*. 

In practice, finding the orthogonal of a subset of a finite dimensional vector space 
or of its dual comes down to solving linear systems, problem which can be easily 
solved using row-reduction for instance. Indeed, let V be a finite dimensional vector 
space over F and let S be a set of vectors in V. Finding S+ comes down to finding 
those linear forms / on V vanishing on each element of S. Let e;,..., en be a basis 
of V, then a linear form / on V is of the form 


[(xypey +... + Xnen) = 41X1 +... + anXn 


for some aj,...,da, E F. Writing each element s € S with respect to the basis 
€1,.--,€n yields 


S = Q11 +... + Asnen 


for some scalars a,;. Then / € S+ if and only if 
AAs, +... + anAsn = 0 


for all s € S. This is a linear system in a1, ...,4an, but the reader will probably 
be worried that it may have infinitely many equations (if S is infinite). This is 
not a problem, since as we have already seen St = (Span(S))+ and Span(S) is 
finite dimensional (since a subspace of V), thus by choosing a basis of Span(S) say 
S1,...,5%, we reduce the problem to solving the system 


aids; +... + AnOsin = 0 
for 1 < j < k. The discussion is similar if we want to compute the orthogonal of a 


subset of V*. 
Let us see some concrete examples: 


Problem 6.17. Consider the subspace W of R? defined by 
W ={(x,y,2 E R|x +y +z= 0}. 


Give a basis of the orthogonal W+ of W. 


Solution. By definition, a linear form / on R? belongs to W+ if and only if 
I(x, y,z) = 0 whenever x + y + z = 0. In other words, 


I(x, y,—x—y)=0 forall x,y ER, 
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which can be written as 
x1(1,0,—1) + y/(0,1,-1) = 0. 
Thus / € W+ if and only if 
I(1,0,—1) =7(0,1,-1) = 0. 
Now, a linear form / on R? is of the form 
I(x, y,z) = ax + by + cz, 

where a,b,c are real numbers. Thus / € W+ if and only if 

a-—c=0, b-c=0, 
or equivalently a = b = c. It follows that the linear form 

lb(x,yg=xtytz 


is a basis of Wt. 


Problem 6.18. Let S = {v1, v2, v3} C R*, where 
vı = (1,0,1,0), v= (0,1,1,0), v3 = (1,1,0,1). 


Describe S+ by giving a basis of this space. 


Solution. A linear form / on R4 is of the form 
I(x, y,z,t) =ax+by+cz+dt, 


where a,b, c,d are real numbers. The condition / € S+ is equivalent to 


lvi) = (v2) = 103) = 0. 


Thus / € S+ if and only if a,b,c,d are solutions of the system 


a+c=0 
b+c=0 
-—a+b+d=0 
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This system can be solved without difficulty: the first and second equations give 
a = b = —c and the third equation yields d = 0, thus the solutions of the system 


are {(u, u, —u, 0)|u € R}. The corresponding linear forms are 


L(x, y, zt) = u(x + y =z), 
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hence a basis of S+ is given by 


L(x, y,Zt)=x+ty—z. 


Oo 
Problem 6.19. Consider the set S = {1;, l2} where 
L(x, y,z) =2x+3y-z, h(x, y,z) =x—2y+z. 
Find a basis for S+. 
Solution. A vector (x, y,z) is in S+ if and only if 
(x,y,z) = h(x, y,z) = 0, 
that is 
2x + 3y—-z=0 
x—2y+z=0 
Solving the system yields 
y= -3x, z=-7x 
Thus a basis of S+ is given by (1, —3, —7). E 


Let us continue with an easy, but important theoretical exercise. 


Problem 6.20. a) If S; C S2 are subsets of V or of V*, then Sx € SE; 
b) If S is a subset of V or V*, then S C (S+)+. 


Solution. a) Suppose that S1, S2 are subsets of V. If} € S L then / vanishes on S3. 
Since Sı C S, it follows that / vanishes on S4 and so l € SE: Thus Ss (E SE. 
Suppose that S1, S2 are subsets of V*. If v € SE; then all elements of S2 
vanish at v. Since Sı C So, it follows that all elements of Sı vanish at v and so 
veS m The result follows. 
b) Suppose that S C V and let v € S. We need to prove that if / € S+, then 
(1, v) = 0, which is clear by definition! Similarly, if S C V* and l € S, we need 
to prove that (J, v} = 0 for all v € $+, which is again clear. oO 


Remark 6.21. While it is tempting to believe that the inclusion in part b) of the 
problem is actually an equality, this is completely false: (S+)+ is a subspace of 
V or V*, while S has no reason to be a subspace of V or V* (it was an arbitrary 
subset). Actually, we will see that the inclusion is an equality if S is a subspace 
of V or V* when V is finite dimensional. 
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The fundamental theorem concerning duality of vector spaces is the following: 


Theorem 6.22. Let V be a finite dimensional vector space over F. Then for all 
subspaces W of V or V* we have 


dim W + dim W+ = dim V. 


Proof. Let n = dim V. Let W be a subspace of V, of dimension m < n, and 
let e1,..., €m be a basis of W, completed to a basis e;,...,e, of V. We need to 
prove that dim w+ = n — m. Let ey,...,@, be the dual basis of V* associated 
with e1,..., €n. We will prove that entl ..., 7 is a basis of W+, which will 
prove the equality dim W+ = n — m. First, notice that e% yiee En belong to 
WŁ, since e* vanishes at e),...,@, for all m < j < n, thus it vanishes on 
W = Span(e1,..., em). 

Since e7,,,...,@; form a subfamily of the linearly independent family 


e*,...,e*, it suffices to prove that they span W+. Let] € W+, so that / vanishes 
on W. Using Remark 6.5, we obtain 


n 


l= x (l, ei)e¥ € Span(ey.4,...,e7) 
i=m+1 


and the proof of the equality dim W+ = n — m is finished. 

Suppose now that W is a subspace of V*. By definition W+ consists of vectors 
v € V such that (/, v) = O forall? € W.Leti : V > V** be the canonical biduality 
map. The equality (/, v} = 0 is equivalent to ((v),/) = 0. Thus v € W+ if and only 
if (v) e (V*)* vanishes on W. Since ¢ is an isomorphism and since the space of 
g € (V*)* which vanish on W has dimension dim V* — dim W = dim V — dim W 
by the first paragraph, we conclude that dim W+ = dim V — dim W, finishing the 
proof of the theorem. Oo 


Let us also mention the following very important consequence of the previous 
theorem: we can recover a subspace in a finite dimensional vector space (or its dual) 
from its orthogonal: 


Corollary 6.23. Let V be a finite dimensional vector space over F and let W bea 
subspace of V or V*. Then (W+)+ = W. 


Proof. By Problem 6.20 we have an inclusion W C (W+)+. By the previous 
theorem 


dim(W+)+ = dim V — dim W+ = dim W. 


Thus we must have (W+)+ = W. Oo 


The previous result allows us to give equations for a subspace W of a finite 
dimensional vector space V over F. Indeed, let n = dim V and p = dim W, thus 
dim W+ = n — p by the previous theorem. Let /),..., In—p be a basis of WŁ. Then 
by the previous corollary 


W = (W+)t = {ve VILO =... = n-p) = 0}. 
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If e1,..., en is a fixed basis of V, the linear form l; is of the form 
li(xie1 +... + Xn@n) = GX, + Gj2X2 +... + AinXn 
for some a;; € F. We deduce that 
W = {xie t+... + Xnen E Vi aixi +... 4+4inX, =O forall 1<i <n-—p}, 


in other words W can be defined by n — p equations, which are linearly inde- 
pendent (since /;,...,/;,—» form a basis of W and thus are linearly independent). 
Moreover, one can actually write down explicitly these equations if we know the 
coefficients a;;, in other words if we can find a basis of W+., But if W is given, 
then we have already explained how to compute W+, and we also know how to 
compute a basis of a given vector space, thus all the previous steps can actually be 
implemented in practice (we will see a concrete example in a few moments). 
Conversely, if /;,...,/,—p are linearly independent linear forms on V, then 


Z=(WveV|h(vy) =... = l-0) = 0} 
is a vector subspace of V of dimension p, since 
Z = (Span(h,...,ln-p))}, 
thus by Theorem 6.22 
dim Z = n — dim Span (l, . . . , ln=p) = n — (n — p) = p. 


We can summarize the previous discussion in the following fundamental: 
Theorem 6.24. Let V be a vector space of dimension n over a field. 


a) If W is a subspace of V of dimension p, then we can find linearly independent 
linear forms l,,...,ln—p on V such that 


W = {ve V| h0) =... =p) = 0}. 
We say that lı (v) = ... = In—p(v) = 0 are equations of W (of course, there are 
many possible equations for W !). 
b) Conversely, if l,,...,1,—p are linearly independent linear forms on V, then 
W = {ve V|h0) =... = l-0) = 0} 


is a subspace of dimension p of V. 


With the above notations, the case p = n — 1 is particularly important and 
deserves a 
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Definition 6.25. Let V be a finite dimensional vector space over F. A subspace W 
of V is called a hyperplane if 


dim W = dim V — 1. 


For instance, the hyperplanes in R? are the subspaces of dimension 1, i.e., the 
lines. On the other hand, the hyperplanes in R? are the subspaces of dimension 2, 
i.e., planes spanned by two linearly independent vectors (this really corresponds to 
the geometric intuition). There are several possible definitions of a hyperplane and 
actually the previous one, though motivated by the previous theorem, is not the most 
natural one since it does not say anything about the case of infinite dimensional 
vector spaces. The most general and useful definition of a hyperplane in a (not 
necessarily finite dimensional) vector space V over F is that of a subspace W 
of V of the form kerl, where / is a nonzero linear form on V. In other words, 
hyperplanes are precisely the kernels of nonzero linear forms. Of course, this 
new definition is equivalent to the previous one in the case of finite dimensional 
vector spaces (for instance, by the rank-nullity theorem or by the previous theorem). 
It also shows that the hyperplanes in F” are precisely the subspaces of the form 


H = {(x1,..., Xn) E€ F"| aixi +... + anXn = 0} 


for some nonzero vector (a1, ..., an) € F”. In general, if e1,..., en is a basis of V, 
then the hyperplanes in V are precisely the subspaces of the form 


H = {v = xei +... + Xnen E€ V| axı +... + anXn = 0}. 


Notice that if H is a hyperplane in a finite dimensional vector space, then H+ has 
dimension 1, thus it is a line in V. 

We say that hyperplanes H,,...,H, are linearly independent if they are the 
kernels of a linearly independent family of linear forms. The previous theorem can 
be rewritten as: 


Theorem 6.26. a) Any subspace of dimension p in a vector space of dimension n 
is the intersection of n — p linearly independent hyperplanes of V. 

b) Conversely, the intersection of n— p linearly independent hyperplanes in a vector 
space of dimension n is a subspace of dimension p. 


We end this section with two concrete problems: 
Problem 6.27. Let W be the subspace of R* spanned by the vectors 
vı = (1,1,—1,0) and v = (—1,2,-1,1). 
Find equations for W. 


Solution. Here V = R‘ and e;,e2,e3,e4 is the canonical basis of V. As the 
discussion above shows, the problem comes down to finding a basis of W+. Now 
W+ consists in those linear forms 


I(x, y,z,t) =ax+by+cz4+dt 
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which vanish on vı and v2, i.e., such that 
a+b-c=0, -a+2b-—c+d=0. 
We obtain 
c=at+b, d=a+c—-2b=2a-b 
and so 
I(x, y,z,t) =ax+ by + (a + b)z + (2a — b)t 
=a(xx+z+2t)+b(y +z-t). 
We deduce that a basis of W+ is given by 
h(x, y,zt)=x+z+2t and h(x,y,zt)=y+z-t. 
As we have already seen above, we have 
W = {v € V| L0) = h0) =O = 


{œ y,z t) E RtJlx +z+2t=y+z-t=0} 


and lı (v) = h (v) = 0 are equations for W. Oo 


Problem 6.28. Let V = R;3[X]. Write the vector subspace of W spanned by 1+ X 
and 1 — X + X? as the intersection of 2 linearly independent hyperplanes. 


Solution. Consider the canonical basis 


e = l1, e2 = X, 63 = X? e, = X? 


of V and 
y=14+X% =e+@, v =1— X +X? =e — e +e. 


Writing W = Span(vı, v2) as the intersection of 2 linearly independent hyperplanes 
is equivalent to finding two equations defining W, say lı (v) = lh2(v) = 0, as then 


W = HN M, where H; = kerli. 


Thus we are reduced to finding a basis /,,/, of W+. A linear form / on V is of 
the form 


[(xye) + x22 + x3e3 + X4e4) = axı + Dx. + cx3 + dx4 
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for some real numbers a,b,c,d. This linear form belongs to W+ if and only if 
1(v1) = [(v2) = 0, which is equivalent to 


a+b=a-—b+d=0. 
This gives b = —a and d = —2a, that is 
[(xyjey +... + x4e4) = a(x, — X2 — 2x4) + €X3. 
We deduce that a basis /;,/, of W+ is given by 
li(xiei +... + X424) = xı — x2 — 2x4, b(xyey +... + x4e4) = X3 
and so W is the intersection of two linearly independent hyperplanes 
A, = kerl; = {a+ bX + cX? + dX? €V|a—b—2d = 0} 
and 


H, = kerl, = {a+ bX + eX? + dX? € V|c =0}. 


6.2.1 Problems for Practice 


1. Consider the linear forms 
(x,y) =x—-2y, h(x, y) = 2x + 3y 


on R°. Give a basis of S+, where S = {1;, h}. 
2. Give a basis of St, where S consists of the linear forms 


hQwy2a=xty-z h(x,y,z)=2x-3y+z, (x,y,z) = 3x -2y 


on R?. 
3. Find a basis of W+, where 


W = {(x,y,z,t) € R4|x +2y +z- t = 0}. 
4. Let S = {(11, v2, v3)}, where 
vy, = (0,1,1), vw=,1,0), v3 = (3,5, 2). 


Describe SŁ. 


6.2 


10. 
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. Give equations for the subspace of R* spanned by 


vı = (1,—2,2,—1), v =(-1,0,4,—2). 


. a) Find the dimension p of the subspace W of Rt spanned by 


vi = (1,2,—2,1), v= (—1,2,0,—3), v3 = (0,4,—2, —2). 


b) Write W as an intersection of 4 — p linearly independent hyperplanes. 
c) Can we write W as the intersection of 3 — p hyperplanes? 


. Let V = M, (R) and for each A € V consider the map 


la: V >R, L4(B) = AB. 


a) Prove that l4 € V* forall A € V. 
b) Prove that the map 


Vov*, Arl, 
is a bijective linear map (thus an isomorphism of vector spaces). 
c) Let S, and A, be the subspaces of V consisting of symmetric, respectively 


skew-symmetric matrices. Prove that 


St ={I4|A€ An} and At = {l4| A € Sy}. 


. Let V be the space of polynomials with real coefficients and let W be the 


subspace of V* spanned by the linear forms (ln)n>0, where l, (P) = P“(0). 
Prove that WŁ = {0}, but W 4 V*. Thus if W is a subspace of V*, we do not 
always have (W+)+ = W (this is the case if V is finite dimensional, or, more 
generally, if W is finite dimensional). 


. Let / be a linear form on M,,(R) such that 


1(AB) = (BA) 


for all A, B € M,,(R). Let (£;;)1<;,;<, be the canonical basis of M, (R). 

a) Prove that /(£1,;) = ... = /(£,,). Hint: fori A j Ej; Ej; = Ei and 
EjiEj = Ej. 

b) Prove that /(£;;) = 0 fori Æ Ji Hint: En Ej; = Ej; and Ej Ej; = On. 

c) Deduce that there is a real number c such that 


(A) =c-Tr(A) forall Ae M,(R). 


Using the previous problem, determine the span of the set of matrices of the 
form AB — BA, with A, B € M,,(R) (hint: consider the orthogonal of the span). 
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11. Let V be a vector space and let W;, Wz be subspaces of V or V*. Prove that 
(Wi + Wt = WE Nn WE. 


12. Let V be a finite dimensional vector space and let W; and Wz be subspaces of 
V. Prove that 


(Wi N W) = WE + WE. 


Hint: use the previous problem and Corollary 6.23. 

13. Let W1, W2 be complementary subspaces in a finite dimensional vector space V 
over a field F. Prove that w and Ww are complementary subspaces in V*. 

14. Let Hı, H3 be distinct hyperplanes in a vector space V of dimension n > 2 over 
R. Find dim(H; N H3). 

15. Prove that a nonzero finite dimensional vector space over R is not the union of 
finitely many hyperplanes. 

16. Prove that the hyperplanes in M, (R) are precisely the subspaces of the form 


{X € M,(R)| Tr(AX) = 0} 


for some nonzero matrix A € M,,(R). 

17. Let W be a subspace of dimension p in a vector space V of dimension n. Prove 
that the minimal number of hyperplanes whose intersection is W is n — p. 

18. Let V be a finite dimensional vector space and let /,/,,...,/, € V* be linear 
forms. Prove that / € Span(/;,...,/,) if and only if N’_, kerl; C kerl. 


i=l 


6.3 The Transpose of a Linear Transformation 


Let V,W be vector spaces over a field F and let T : V — W be a linear 
transformation. For each ] € W* we can consider the composite } o T : V > F, 
which is a linear form on V. We obtain therefore a map 


IT: W* > V*, 'T(D)=loT. 


In terms of the canonical pairing between V and V*, and between W and W*, 
we have 


CTO, v) = (LT) 


forall? € W* and v € V. We call 'T the transpose of the linear transformation T. 
If V and W are finite dimensional, the following theorem completely elucidates 
the map ‘T: 
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Theorem 6.29. Let T : V — W be a linear transformation between finite 
dimensional vector spaces and let B and B' be two bases of V and W respectively. 
If A is the matrix of T with respect to B and B', then the matrix of 'T : W* > V* 
with respect to the dual bases of B’ and B is ' A. 


Proof. Let B = (vi,..., vn) and B = (w1,..., Wm). Write A = [aj;;] and let 
B = [b;;] be the matrix of ‘T with respect to the bases wy,...,w, and vf,...,v%. 
By definition we have 


T(vi) =J _ajiwj, Vl<i<n 


j=l 


and 
Twi) =} bevy, Visism. 
k=l 
Fix 1 <i < m and write the last equality as 
X bevy =wř oT. 
k=1 
Evaluating at v;, with j € [1, n] arbitrary, we obtain 
X buvo) = X briðkj = bji 
k=1 k=1 
and 
w; (T(v;)) = wi (>) ajwi) = X aji = ij. 
tal 1=1 


Comparing the two expressions yields 


aij =b;; forall i,j, 


which is exactly saying that B = ‘A. E 


The following problems establish basic properties of the correspondence 
T — 'T. For linear maps between finite dimensional vector spaces, they follow 
immediately from the previous theorem and properties of the transpose map on 
matrices that we have already established in the first chapter. If we want to deal with 
arbitrary vector spaces, we cannot use these results. Fortunately, the results are still 
rather easy to establish in full generality. 
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Problem 6.30. Prove that for all linear transformations Ti, To : V — W and all 
scalars c € F we have 


(Ti + ch) = 'T, + cT. 
Solution. We need to prove that if / is a linear form on W, then 
lolTi+ch)=loTi+clo nh. 


This follows from the fact that / is linear. O 


Problem 6.31. a) Let 7; : V > Və and 7> : V2 — V3 be linear transformations. 
Prove that 


"(Tr 0 Tı) = 'T, O "Ta, 


b) Deduce that if T : V — V is an isomorphism, then so is ‘T : V* > V*, and 
(Ty = "(F-). 


Solution. a) Let / be a linear form on V3. Then 
(ToT =l (oT) = (lo Th) oT = 
Tilo P) = 'T('T2()) = ‘T o T). 
The result follows. 
b) Since T is an isomorphism, there is a linear transformation T7! such that T o 


T—! = T7! oT = id. Using part a) and the obvious equality ‘id = id, we obtain 


iT o '(T7}) = id = '(T7) œ T, 


from where the result follows. O 


Problem 6.32. Let T : V —> W be a linear transformation and let ty : V > V**, 
tw : W —> W** be the canonical biduality maps. Prove that 


woT ='('T)oty. 
Solution. Let v € W, then 
CT) oiy) = ‘('T)(ev,) = ev, o ‘T. 
The last map sends l € W* to 
ev, o ‘T(1) =ev, (10 T) = (lo T)(v) = l(T()) 


= evr) (1) = w (TO). 
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Thus 


'((T) 0 ty(v) = ev, o'T = tw(T()) 


for all v € V, which is exactly the desired equality. Oo 


The following technical but very important result makes the link between the 
transpose operation and orthogonality. This allows us to use the powerful results 
established in the previous section. 


Theorem 6.33. Let T : V — W be a linear transformation between finite 
dimensional vector spaces. We have 


ker(‘T) = (Im(T))+, ker T = (Im(‘T))* 
and 
Im('T) = (kerT)+, Im(T) = (ker(‘T))*. 
Proof. By definition we have 
ker('T) = {lEW*|loT = 0 = {le W*|l(T(v)) =0Yve V} 
= {1 € W*|I(w) = 0Y w € Im(T)} = (Im(T))+. 
Similarly, we have 
(Im(‘T))t = {ve V|‘TD(v) = OV1 € W*} 
= {ve V|l(T()) =0V1 €e W*} = {ve V| T) = 0} = kerT. 
Note that we could have also deduced this second result by using the already 
established equality ker(‘T) = (Im(7))+, applying it to T and using the previous 
problem (and the fact that iy and tw are isomorphisms). 
Using what we have already established and the fact that our spaces are finite 


dimensional (thus we can use Corollary 6.23), we obtain 


(ker T) = ((Im(‘T))+)+ = Im(‘T). 


We proceed similarly for the equality Im(7) = (ker(‘'T7))+. Oo 
The previous theorem allows us to give a new proof of the classical but nontrivial 


result that a matrix and its transpose have the same rank: 


Problem 6.34. a) Let T : V — W be a linear transformation between finite 
dimensional vector spaces. Prove that T and ‘T have the same rank. 
b) Prove that if A € Mm n(F), then A and its transpose have the same rank. 
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Solution. Using Theorem 6.29, we see that b) is simply the matrix translation of a). 
In order to prove part a), we use Theorem 6.33, which yields 


rank(‘T) = dim(Im(‘T)) = dim(ker T)*. 
By Theorem 6.22 and the rank-nullity theorem, the last expression equals 
dim V — dimker T = dim Im(T) = rank(T). 


The result follows. 


6.3.1 Problems for Practice 


In the next problems we fix a field F. 


1. Consider the linear map 
T:R >R’, T(x, y,2) = (x-2y + 3z,.x-y +z). 


Let ef, ež be the dual basis of R?. Find the coordinates of the vector 'T (eš — 
ež, ef + eš) with respect to the dual basis of the canonical basis of RÈ. 

2. Find the matrix of ‘T with respect to the dual base of the canonical base of R?, 
knowing that 


T(x, y,z) = (x — 2y + 3z,2y —z,x — 4y + 3z). 


3. Let T : V — W bea linear transformation between finite dimensional vector 
spaces over F. Prove that 


a) T is injective if and only if ‘7 is surjective. 
b) T is surjective if and only if ‘T is injective. 


4. Let T : V — V bea linear transformation on a finite dimensional vector space 
V over F, and let W be a subspace of V. Prove that W is stable under T if and 
only if W+ is stable under ‘T. 

5. Find all planes of R? which are invariant under the linear transformation 


T:R? >R, T(x, y,d = (x—2y+z,0,x +y +2). 
6. Let V be a finite dimensional vector space and let T : V — V be a linear 


transformation such that any hyperplane of V is stable under T. Prove that T is 
a scalar times the identity (hint: prove that any line in V* is stable under ‘T ). 
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In this section we will use the results established in the previous sections to give a 
simple proof of a beautiful and extremely important theorem of Jordan. This will be 
used later on to completely classify matrices in M, (C) up to similarity. Actually, the 
proof will be split into a series of (relatively) easy exercises, many of them having 
their own interest. We will work over an arbitrary field F in this section, but the 
reader may assume that F = R or C if he/she wants. 

We have seen several important classes of matrices so far: diagonal, upper- 
triangular, symmetric, orthogonal, etc. It is time to introduce another fundamental 
class of matrices and linear transformations: 


Definition 6.35. a) Let V be a vector space over F and let T : V —> V bea linear 
transformation. We say that T is nilpotent if T% = 0 for some k > 1, where 
Tt =ToTo...oT (k times). The smallest such positive integer k is called the 
index of T. Thus if k is the index of T, then T% = 0 but T4! Æ 0. 

b) A matrix A € M,(F) is called nilpotent if A£ = O, for some k > 1. The 
smallest such positive integer k is called the index of A. 


If V is a finite dimensional vector space over F, if B is a basis of V and if T : 
V — V isa linear transformation whose matrix with respect to Bis A € M, (F), 
then the matrix of T* with respect to B is A*. It follows that T is nilpotent if and 
only if A is nilpotent, and in this case the index of T equals the index of A. 
In particular, any matrix similar to a nilpotent matrix is nilpotent and has the same 
index. This can also be proved directly using matrix manipulations: if A is nilpotent, 
P is invertible, and B = PAP, then an easy induction shows that 


BE = PA po 
for all k > 1, thus B* = O, if and only if Ak = O,, establishing the previous 
statement. 


Problem 6.36. Let T,, T2 be two linear transformations on a vector space V and 
assume that T; o h = D o Ti. If Ti, D are nilpotent, then so are T; o T) and 
T, + To. 


Solution. Say Tř = 0 and T;? = 0 for some kı,kz > 1. Then TK = TE = 0 
where k = kı + k2. Since T; and To commute, we obtain 


(Tio T) = TK o TF =0 


and 


2k 
2k a) 
(M+ hy =>) k P Tj. 


i=0 


226 6 Duality 


For each 0 < i < k we have T7*~’ = 0 and for each i € [k + 1,2k] we have 
Ti = 0. Thus T? T} = 0 for all 0 < i < 2k and so (T; + T2)™ = 0, establishing 
that Ti + T is nilpotent. Oo 


Remark 6.37. 1) Similarly (and actually a consequence of the problem), the 
sum/product of two nilpotent commuting matrices is a nilpotent matrix. 
2) The result of the previous problem is no longer true if we don’t assume that 


T; and T, commute: the matrices lo H and h j are nilpotent, but their sum is 


{= 


not nilpotent, also the matrices [f H and F 


l are nilpotent, but their product 


is not nilpotent. 

3) It follows from 2) that the nilpotent matrices in M,„(F) do not form a vector 
subspace of M,,(F). A rather challenging exercise for the reader is to prove that 
the vector subspace of M„ (F) spanned by the nilpotent matrices is precisely the 
set of matrices of trace 0. 


The result established in the following problem is very important: 


Problem 6.38. a) Let T : V — V be a nilpotent transformation of index k 
and let v € V be a vector such that T*'(v) 4 0. Prove that the family 
(v, T(v),..., T*~!(v)) is linearly independent in V. 

b) Deduce that if V is finite dimensional then the index of any nilpotent transfor- 
mation on V does not exceed dim V. 

c) Prove that if A € M,,(F) is nilpotent, then its index does not exceed n. 


Solution. a) Suppose that 
aov + aiT) +... +a T!) = 0 (6.1) 


for some scalars ao, ...,ag—1. Applying T*~! to this relation and taking into 
account that TŻ = 0 for j > k yields 


agT*(v) +0+...4+0=0, 


and since T*~!(v) 4 0, we obtain a) = 0. Applying now T*~? to relation 
(6.1) gives a;T*—!(v) = 0 and then a; = 0. Continuing by induction yields 
ay =... = ag—, = O and the result follows. 

b) Suppose that T is nilpotent on V, of index k. Part a) shows that V contains a 
linearly independent family with k elements, thus dim V > k and we are done. 

c) This follows from b) applied to V = F” and the linear map T : V —> V sending 
X to AX (using the discussion preceding the problem, which shows that A and 
T have the same index). Oo 


Using the previous problem, we are ready to introduce a fundamental kind of 
nilpotent matrix: Jordan blocks. This is the goal of the next problem: 
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Problem 6.39. Let T : V — V be a nilpotent linear transformation on index k on 
a vector space, let v € V and let 


W = Span(v, T(v),..., 7" 1(v)). 


a) Prove that W is stable under T. 

b) Prove that if T*~!(v) ¢ 0, then T*~!(v), T‘~?(v),..., T(v), v form a basis of 
W (thus dim W = k) and the matrix of the linear transformation T : W —> W 
with respect to this basis is 


010...0 
001...0 
Saw aes gri 
000...1 
000...0 


This matrix is called a Jordan block of size k (note that J; = O}, the 1 x 1 
matrix with one entry equal to 0). 


Solution. a) Any element of W is of the form 
w=avta,T(vy)+...+ eT (v). 
Since T* (v) = 0, we have 
T(w) = aT (v) +... + ara T! (v) € W, 
thus W is stable under T. 
b) If T*—!(v) Æ 0, part a) of the previous problem shows that T*! (v), ..., T(V), v 
is a linearly independent family and since it also spans W, it is a basis of W. 
Moreover, since T* (v) = 0 and 


T(T'(v)) = TT 0) 


for k — 2 >i > 0, it is clear that the matrix of T : W —> W with respect to this 
basis is Jg. O 


The main theorem concerning nilpotent linear transformations on finite 
dimensional vector spaces is the following beautiful: 


Theorem 6.40 (Jordan). Let V be a finite dimensional vector space over a field F 
and let T : V — V be a nilpotent linear transformation. Then there is a basis of V 
with respect to which the matrix of T is of the form 
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Jy, 0... 0 
0 Jr 0 
0 0... Jrg 
for some sequence of positive integers kı > ky > ... > ką with 


ki +... + ka =N. 
Moreover, the sequence (k,,...,kKa) is uniquely determined. 
We can restate the previous theorem in terms of matrices: 


Theorem 6.41 (Jordan). Any nilpotent matrix A € M,(F) is similar to a block- 


Jy, 0... 0 
0 J... 0 

diagonal matrix ae f for a unique sequence of positive integers 
0 0 diy 


(ki, ..., ka) with kı > kz > ... > kq and 


Jy, 0... 0 
0 : 
The matrix | | | _ | is called the Jordan normal (or canonical) form 
0 0... Jk 
of A or T. 
The next series of problems is devoted to the proof of this theorem. We will start 
with the uniqueness of the sequence (k,..., ka). The proof, given in the next three 


problems, will also show how to compute explicitly these integers and therefore how 
to find in practice the Jordan normal form of a nilpotent matrix. 


Problem 6.42. Let T be the linear transformation on F” associated with the Jordan 
block J„. Prove that for all 1 < k < n — 1 we have 


rank(T*) =n —k 
and deduce that 


rank(J*) =n—k 
forl<k<n-l. 


Solution. If e@),...,e, is the canonical basis of F”, then 


T(e)=0, T(er2)=e1, T(e3)=e2,..., Ten) = en-1. 
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In other words, T(e;) = e;-; for 1 < i < n, with the convention that eọ = 0. 
We deduce that T?(e;) = T(e;-1) = ei—2 for 1 < i < n, with e_; = 0. 
An immediate induction yields 
T’ (ei) = e-; 
for! < j <n—Jland1 <i <n, with e, = 0 forr < 0. Thus 
Im(T/) = Span(e1, e2, ..., en—;j) 


and this space has dimension n — j, which yields 


rank(T*) =n — k 


for 1 < k < n — 1. The second part is an immediate consequence of the first part. C 


Problem 6.43. Suppose that A € M,,(F) is similar to 


Jy, 0... 0 
0 Je... 0 
0 0... Jk 
Let N; be the number of terms equal to j in the sequence (k1, ... , ka). Prove that 


forall<j<n 


rank(A/) = Nj41 +2Nj42+...+(2— J) Nn. 


Solution. If A,,..., Ag are square matrices, then 
A; 0... 0 
0 A>... 
rank Meee. Vs = rank(A;) +... + rank(Aq), 
0 0... Ag 


as the reader can easily check by using the fact that the rank of a matrix is the 
dimension of the span of its column set. Since similar matrices have the same rank, 
we deduce that for all j > 1 we have 


JÍ 0...0 
l 0 Jj... 0 g l 
rank(A/) = rank i > = X > rank(Jj). 
On Od ss. 
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By the previous problem, rank(J a ) equals k; — j if j < k; and 0 otherwise. Thus, 
since N, is the number of indices i for which k; = t, we have 


d 
Y > rank( J) = > rank(J/) 
i=l 


t>j kj=t 


=X N- (t-j) = Nj + 2Nj4a +... + 0- j)Na. 


Sj 
Oo 
Problem 6.44. Prove that if kı >... > kg and k| > ... = kl, are sequences of 
Jy, O ... 0 
O J, ... 0 
positive integers adding up to n and such that A = st gh es is similar 
0 0... i, 
Jy 0... 0 
0 Jy 0 
to B = i , then these sequences are equal. This is the uniqueness 
0 0... Je, 
part of Jordan’s theorem. 
Solution. Let N; be the number of terms equal to j in the sequence (k, ..., ka), 
and define similarly N; for the sequence (k;,...,k/,). We are asked to prove that 


N; =N;jforl <j <n. 
Since A and B are similar, A and B/ are similar for all j = 1, thus they have 
the same rank. Using the previous problem, we deduce that 


Nji t+ 2Nj4o t+... + (2 — f)Nn = Nig, + 2Nigo +... n N, 


for j > 1. Setting j = n — 1 gives N, = NJ, then setting j = n — 2 and using 
Nn = N; gives Nn—1 = N, Continuing this way yields N; = N; for2 < j <n. 


f= 


We still need to prove that N; = N i , but this follows from 
Ni +2M2 +... +nN, = Ni +2N}+...+nN; =n, 
since 


ki +... + ka =ki +... +ky =n. = 


Remark 6.45. The previous two problems show how to compute the sequence 
(ki,...,kq) in practice. Namely, we are reduced to computing Nj,..., Na. For this, 
we use the relations 


rank(A’) = Nj41+2Nj42+...+(n—j)Nn 
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for 1 < j < n (it suffices to take j < k if A has index k, noting that the previous 
relation for j = k already yields Ny4; = ... = N, = 0). These determine 
completely M2, ... Nn. To find Ni, we use the relation 


Ni +2No.+...+nN, =n. 


Example 6.46. As a concrete example, consider the matrix 


One can easily check that this matrix is nilpotent: we compute using the product 
rule 


00-48 
00-48 
0000 
0000 


A= 


and then A? = O3, using again the product rule. Thus A is nilpotent of index k = 3. 
It follows that N4 = O and 


Ni + 2N>. + 3N3 = 4. 


Next, it is easy to see that the rank of A is 2, since the first and second rows are 
identical, the last row is half the third row, and the first and third row are linearly 
independent. Thus 


2 = rank(A) = No + 2N3+3N4 = N2 + 2N3 
Next, it is clear that A? has rank 1, thus 
1 = rank(A”) = N; + 2N, = N3. 


It follows that 


and so the Jordan normal form of A is 


0100 
0010 
0000 
0000 
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The uniqueness part of Jordan’s theorem being proved, it remains to prove the 
existence part, which is much harder. The basic idea is however not very surprising: 
we work by strong induction on dim V, the case dim V = 1 being clear (as then 
T = 0). Assume that the result holds for dim V < n and let us consider the case 
dim V = n. We may assume that T Æ 0, otherwise we are done. Let kı = k be the 
index of T and let v € V such that T*~!(v) Æ 0. By Problem 6.39, the subspace 


W = Span(v, T(v),...,T*!(v)) 


is invariant under T, which acts on it as the matrix J; on F k Moreover, dim W = k. 
If k = n, then we are done. If not, we look for a complementary subspace W’ of W 
which is stable under T. If we could find such a space W’, then we could apply the 
inductive hypothesis to the map T : W” —> W” (note that its index does not exceed 
kı) and find a basis of W” in which the matrix of T has the desired form. Patching 
the basis T*~!(v),..., T(v), v and this basis of W’ would yield the desired basis of 
V and would finish the inductive proof. The key difficulty is proving the existence 
of W’. This will be done in the two problems below. 


Problem 6.47. a) Prove that if A € M,(F) is nilpotent, then ‘A is nilpotent and 
has the same index as A. 

b) Suppose that V is a finite dimensional vector space over F. Prove that if T : 
V — V is nilpotent, then ‘T : V* —> V* is also nilpotent and has the same 
index as T. 


Solution. a) For all k > 1 we have 
(ay = (A‘), 


thus CAF = O, if and only if A* = O,. The result follows. 

b) Let B be a basis of V and let B* be the dual basis of B. If A is the matrix of T with 
respect to B, then the matrix of 'T with respect to 6* is 'A, by Theorem 6.29. 
The result follows now from part a). 

We can also prove this directly as follows: if k > 1, then (‘7)* = 0 if and 
only if (‘T)‘(1) = 0 for all Z € V*, equivalently ] o T% = 0 for all] € V*. 
This can be written as: for all v € V and all] € V* we have /(T*(v)) = 0. 
Now, the assertion that /(7*(v)) = 0 for all Z € V* is equivalent to T* (v) = 0, 
by injectivity of the biduality map V > V**. Thus (‘T)* = 0 if and only if 
TK = 0, and this even when V is infinite dimensional. In other words, part b) 
holds in all generality (but the proof requires the injectivity of the map V > V**, 
which is difficult and was not given for infinite dimensional vector spaces). O 


Problem 6.48. Let T : V — V be a nilpotent transformation of index k on a finite 
dimensional vector space V and let v € V be such that T*—!(v) 4 0. We denote for 
simplicity S = ‘T : V* + V* and we recall that S is nilpotent of index k by the 
previous problem. 


a) Explain why we can find a linear form / € V* such that 


I(T!) £ 0. 
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b) Prove that the orthogonal W’ of 
Z = Span(/, S(J),..., ST(D) C V* 


is stable under T. 

c) Prove that dim W’ + dim W = dim V. 

d) Deduce that W’ W = V, thus W” is a complementary subspace of W, stable 
under T. This finishes the proof of Jordan’s theorem! 


Solution. a) This is a direct consequence of the injectivity (and actually bijectivity 
since our space is finite dimensional) of the biduality map V > V**. 

b) Let us try to understand concretely the space Z+. A vector x is in Z+ if and only 
if S/(1)(x) = 0 for 0 < j < k — 1. Since S = T*, we have 


SDW = (Lo THW = UT! (x), 
thus 
Z+={xeV|U(Ti(x)) =0 forall 0< j <k-1}. 
Now let x € Z+ and let us prove that T(x) € Z+, i.e., that 
I(T! (T(x))) =0 
for 0 < j < k — 1, or equivalently /(T/(x)) = 0 for 1 < j < k. This is clear 
for 1 < j < k — 1, since x € ZŁ, and it is true for j = k since by assumption 
Tt = 0. 
c) By Theorem 6.22 we have 
dim(W’) = dim(Z+) = dim V* — dim Z = dim V — dim Z. 
It suffices therefore to prove that dim Z = dim W. Now dim W = k by 
Problem 6.39, and dim Z = k by the same problem applied to V*, S (which 
is nilpotent of index k) and / (note that S*'(1) = 1 o T! Æ 0 since 
1(T*—!(v)) Æ 0). Thus dim W’ + dim W = dim V. 
d) By part c) it suffices to prove that W’ N W = {0}. Let w € W and write 
w = av + aT (v) +... +a- T4! (v) 
for some scalars dg,...,@,—1. Suppose that w € W’, thus w € ZŁ, that is 
l(T/(w)) = 0 for 0 < j < k — 1. Taking j = k — 1 and using the fact that 
T” = 0 form > k yields 


aol (TT! @)) = 0. 
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Since 1(T*—!(v)) Æ 0, we must have ay = 0. Taking j = k — 2 gives similarly 
a,l(T*—'(v)) = 0 and so a, = 0. Continuing like this we obtain aọ = ... = 
ag—1 = 0 and so w = 0. This finishes the solution of the problem. Oo 


6.4.1 Problems for Practice 


In the problems below F is a field. 


1. Let T : V — V bea linear transformation on a finite dimensional vector space 
such that for all v € V there is a positive integer k such that T* (v) = 0. Prove 
that T is nilpotent. 

2. Let V be the space of polynomials with real coefficients and let T : V —> V be 
the map sending a polynomial to its derivative. Prove that for all v € V there is 
a positive integer k such that T*(v) = 0, but T is not nilpotent. 

. Describe the possible Jordan normal forms for a nilpotent matrix A € M4(F). 

. Find, up to similarity, all nilpotent 3 x 3 matrices with real entries. 

5. A nilpotent matrix A € Ms(C) satisfies rank(A) = 3 and rank(A*) = 1. Find 
its Jordan normal form. 

6. a) Prove that the matrix 


Kw 


3 1 3 
A=] 2 0 2 
—3-1-3 


is nilpotent and find its index. 
b) Find the Jordan normal form of A. 
7. Find the Jordan normal form of the matrix 


-1 10 
A=] 1 12 
1-10 


8. Consider the matrix 
3-1 1 -7 
9-3 —7 —1 


00 4 -8 
00 2 —4 


a) Prove that A is nilpotent. 
b) Find its Jordan normal form. 


6.4 


11. 


12. 


13. 
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. Describe up to similarity all matrices A € M,(F) such that A? = O,. 
. Let A € M, (F) be a nilpotent matrix. Prove that A has index n if and only if 


rank(A) =n — 1. 
Let A € M,,(F) be a nilpotent matrix, say A* = O, for some k > 1. Prove that 
I, + xA is invertible for all x € F and 


(In + xA)! Sahar ee era AL, 


(Fitting decomposition) Let V be a finite dimensional vector space over a field 
F and let T : V > V be a linear transformation. Write 


N =| ]kerT*, 1=()\Im(7"). 


k>1 k>1 


a) Prove that N and Z are subspaces of V, stable under T. 

b) Prove that there exists n such that N = ker T” and J = Im(7"). 

c) Deduce that V = N @ 1. 

d) Prove that the restriction of T to N is nilpotent and the restriction of T 
to I is invertible. We call this decomposition V = N @ J the Fitting 
decomposition of T. 

e) Prove that if V = Vi ® V2 is a decomposition of V into subspaces stable 
under T and such that T |y, is nilpotent and T |y, is invertible, then V; = N 
and V = 1. 


Find the Fitting decomposition of the matrix 


a= [a 


Do the same with the matrix 


Chapter 7 
Determinants 


Abstract This rather technical chapter is devoted to the study of determinants 
of matrices and linear transformations. These are introduced and studied via 
multilinear maps. The present chapter is rich in examples, both numerical and 
theoretical. 


Keywords Determinant * Multilinear map * Laplace expansion ° Cofactor 


This rather technical chapter is devoted to the study of determinants of matrices 
and linear transformations. We have already seen in the chapter devoted to square 
matrices of order 2 that determinants are absolutely fundamental in the study of 
matrices. The advantage in that case is that many key properties of the determinant 
can be checked by easy computations, while this is no longer the case for general 
matrices: it is actually not even clear what the analogue of the determinant should 
be for n x n matrices. 

The definition of the determinant of a matrix is rather miraculous at first sight, 
so we spend a large part of this chapter explaining why this definition is natural 
and motivated by the study of multilinear forms (which will also play a key role 
in the last chapter of this book). Once the machinery is developed, the proofs of 
the main properties of the determinants are rather formal, while they would be very 
painful if one had to manipulate the brutal definition of a determinant as polynomial 
expression of the entries of the matrix. 

Permutations play a key role in this chapter, so the reader not familiar with 
them should start by reading the corresponding section in the appendix dealing 
with algebraic preliminaries. The most important thing for us is that the set S, of 
permutations of {1,2,...,} is a group of order n! with respect to the composition 
of permutations, and there is a nontrivial homomorphism € : S, — {—1, 1}, the 
signature. 
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7.1 Multilinear Maps 


Let Vi, V2,..., Va and W be vector spaces over a field F (the reader might prefer to 
take R or C in the sequel). 


Definition 7.1. A map f : V; x... x Va — W is called multilinear if for all 
i€ {1,2,...,d} and all vı E Vi, ..., Vi 1 € V; l; Vi+1 € Viti eV € Va the 
map 


V >W, vit fi, v... vd) 
is linear. 


Let us see what the condition really says in a few simple cases. First, if d = 1, 
then it simply says that the map f : Vı —> W is linear. Secondly, if d = 2, the 
condition is that x > f(a, x) and x > f(x,b) are linear for all a € Vı and b € 
V2. Such maps are also called bilinear and they will be studied rather extensively 
in the last chapter of the book. If d = 3, the condition is that x > f(a,b,x), 
xh f(a,x,c)and x > f(x,b,c) should be linear for all a € Vi,b € V2 and 
c € V3. 

There is a catch with the previous definition: one might naively believe that a 
multilinear map is the same as a linear map f : Vi x... x Va —> W. This is 
definitely not the case: consider the map f : R? — R sending (x, y) to xy. It is 
bilinear since for all a the map x +> ax is linear, but the map f is not linear, since 


F(A, 0)) + f(0, D) = 0# f(C,0) + (0,1)) = 1. 


One can develop a whole theory (of tensor products) based on this observation, and 
the reader will find the basic results of this theory in a series of exercises at the end 
of this section (see the problems for practice section). 

Though one can develop a whole theory in the general setting introduced before, 
we will specialize to the case Vi = V = ... = Vy and we will simply call this 
space V. Multilinear maps f : V? — W will also be called d-linear maps. The 
next problem gives an important recipe which yields d-linear forms from linear 
forms. 


Problem 7.2. Let fi, /o,..., fa : V — K be linear forms and consider the map 


af OVP >K, Tied Xd) > fix)... faxa). 


Prove that f is d-linear. 


Solution. Ifi € {1,...,d} and x, € Vi,...,Xi-1 E€ Vi—1, X41 E€ Vi4i,...,Xa € 
Va, then the map x; œ> f(x1,...,Xa) is simply the map x; => afi(x;) where 
a= lji fj (x;) is a scalar. Since f; is a linear form, so is a f;, thus x; +> a fi (xi) 


is a linear map and the result follows. Oo 
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Not all d-linear forms are just products of linear forms: 


Problem 7.3. Prove that the map f : (R’)* — R given by 


f(x, x2), V1, ¥2)) = X11 + X22 


is 2-linear, but is not a product of two linear maps, i.e., we cannot find linear maps 
li, l : R? > R such that f(x, y) = h (x)l(y) for all x, y € R?. 


Solution. If xı and x2 are fixed, then it is not difficult to see that the map 
(y1; y2) => X1y1 + X22 is linear. Similarly, if yı, y2 are fixed, then the map 
(x1, X2) > X11 + X2y2 is linear. Thus f is 2-linear. Assume by contradiction that 
f(x,y) = h(x)L(y) for two linear maps 4, l2 : R? > R and for all x = (x1, x2) 
and y = (y1, y2) in R’. It follows that we can find real numbers a = /;(1,0), 
b = 1,(0, 1), c = b(1,0) and d = /,(0, 1) such that 


X11 + x272 = (axı + bx2)(cyı + dy2) 


for all x1, y1, x2, y2 € R. We cannot have (a, b) = (0,0), so assume without loss of 
generality that b ~ 0. Taking x. = —“ we obtain 


b 
ax, 
Xi yı = >” 
for all real numbers x1, y1, y2. This is plainly absurd and the result follows. Oo 


Let us consider now a d-linear form f : V4 —> W and a permutation o € Sq. 
We define a new map o (f) : V4 > W by 


o(f)(x1,.-..Xa) = f (Xo), +++ Xo): 

It follows easily from the definition of d-linear maps that o(/) is also a d- 
linear map. Moreover, for all o, t € Sg and all d-linear maps f we have the crucial 
relation (we say that the symmetric group S4 acts on the space of d -linear forms) 

(ot)(f) = o(t(f)) (7.1) 
Indeed, by definition we have 
(ot)(f )(x1, ee Xa) = METENDE TEF Xo(r(d))) 


while! 


o(t(f))(1,....Xa) = Tf Xo), - -Xod ) = f Xota) +++ Xoaady). 


'Note that setting y; = Xi), we have yz(7) = Xo(r(i))- 
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Recall (see the appendix on algebraic preliminaries for more details on permu- 
tations) that there is a special map € : Sa — {—1, 1}, the signature. The precise 
definition is 


This map € is multiplicative, that is 
elot) = e(o) - e(t) 


for all o, t € Sq. Recall that a transposition is a permutation o for which there are 

integers i # j € {1,2,...,d} such that o (i) = j,o(j) = i ando(k) = k for 

all k # i,j. In this case we write o = (i, j). We note that e(o) = —1 for any 

transposition o. We also recall that any permutation is a product of transpositions. 
We introduce now two fundamental classes of d -linear maps: 


Definition 7.4. Let f : V? —> W be a d-linear map. 


a) We say that f is antisymmetric if o (f) = e(o) f for allo € Sa. 
b) We say that f is alternating if f(x1,xX2,..., X4) = 0 whenever 
X1, X2,..., Xq E V are not pairwise distinct. 


The two definitions look quite different, but most of the time they are equivalent. 
There are however some subtleties related to the field F, as the following problems 
show. However, the reader should keep in mind that over fields such as the real, 
rational, or complex numbers there is no difference between alternating and 
antisymmetric d -linear maps. 


Problem 7.5. Prove that an alternating d-linear map f : V4 —> W is 
antisymmetric. 


Solution. Since any permutation is a product of transpositions and since € is 
multiplicative, relation (7.1) reduces the problem to proving that t( f) = — f for 
any transposition t = (i, j), with i < j. Consider arbitrary vectors x1, X2, ..., Xd 
and note that 


Iia Mp1 HEA Hy Mt, Kft, ME + Ap, Ajay... Xa) =O 


since f is alternating. Using the d-linearity of f, the previous relation can be 
written 


Í Bid Gi ef Mig Ma) Pf Sige aea Nias Ka) + 


F (Kip Kyser Rice Xd) HK S Ops ey Ha) = 0. 
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Using again the fact that f is alternating, it follows that the first two terms in the 
above sum are zero and we obtain the desired result, noting that the third term is 


tT(f)(%1,---, Xa). Oo 


Problem 7.6. Suppose that F € {Q,R, C}. Prove that an antisymmetric d-linear 
map f : V? — W (with V, W vector spaces over F) is alternating. Thus over sucha 
field F there is no difference between antisymmetric and alternating d-linear maps. 


Solution. Suppose that x,,..., Xq are not pairwise distinct, say x; = x; for some 
i < j. Consider the transposition t = (i, j). Since f is antisymmetric and 
e(t) = —1, we deduce that t( f) = — f. Evaluating this equality at (x1, ..., Xa) 
yields 

F Mises Missa hiss hg) = =f Oe Sy eM): 


But since x; = xj, the previous relation can be written 
2f(xX1,...,Xa) = 0. 


Since F € {Q,R, C}, the previous relation yields f(x1,...,xq@) = 0 (note that this 
would be completely wrong if we had F = Fy, see also the example below). Thus 


f is alternating. oO 
Example 7.7. Bad things happen when F = Fy. Let f : F? —> F be the 
multiplication map, that is f(x,y) = xy. It is clearly bilinear and it is not 


alternating, since f(1,1) = 1 Æ 0. On the other hand, f is antisymmetric. Indeed, 
we only need to check that f(x, y) = —f(y,x), or equivalently 2xy = 0. This 
holds since 2 = 1 + 1 = 0. 


A natural question is: how to construct antisymmetric or alternating d-linear 
maps? The following problem shows that starting with any d -linear map f we can 
obtain an antisymmetric one by taking a weighted average of the values o (f ). This 
will play a crucial role in the next section, when defining the determinant of a family 
of vectors. 


Problem 7.8. Let f : V4 —> W be a d-linear map. Prove that 
A) = J) e0) 


o€Sy 
is an antisymmetric d-linear map. 


Solution. It is clear that A( f ) is a d-linear map, since it is a linear combination of 
d-linear maps. Let t € S4 and let us prove that t(A(f)) = e(t)A( f). Note that by 
relation (7.1) we have 


AD = D> e(o)t(o(f)) = D> elolos). 


o€Sq o€Sy 
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Thus, using the fact that e(t)e(o) = e(to), we obtain 


e(t)t(A(f)) = X e(to)(t0)(f). 


o€ Sq 


Note that the map o +> To is a permutation of Sq (its inverse being simply 0 b> 
t~!o), thus the last sum equals Ž ses, E(0)0(f) = A( f). We conclude that 


e(t)t(A(f)) = A) 


and the result follows, since e(t)~! = e(t). Oo 


A crucial property of alternating d-linear forms, which actually characterizes 
them, is 


Theorem 7.9. Let f : V4 — W be an alternating d-linear form. If x, 


X2,...,Xa E V are linearly dependent, then f (x1, X2,...,Xa) = 0. 
Proof. Since x,,...,Xq are linearly dependent, some x; lies in the span of 
(xj) j4i> say 


Xi = J 4jXj 


j#i 


for some scalars a j. Then using the d-linearity of f, we obtain 


Feka = Ya fina msrp Mei Xa). 


Fi 
As f is alternating, each of the terms f(x1,...,Xj-1,%j,Xi+1,.-., Xa) is zero, since 
X1,...,Xj-1, Xj, Xi41,-.-,Xg are not pairwise distinct. Thus f (x1, ..., Xa) = 0. O 


7.1.1 Problems for Practice 


Let F be a field and let V;,..., Vz be finite dimensional vector spaces over F. 
We define the tensor product V; ® ... ® V4 of Vi, ..., Va as the set of multilinear 
maps f : VŽ x... x V — F, where V* is the dual of Vj. 


1. Check that V; ® ... Va is a vector subspace of the vector space of all maps 
FVE LXV F. 
2. Ifv; € V; forall 1 <i < d, define a map vı 8... Q va : VŽ X... X Vi > F by 


M8.. B vai,- fa) = fir) far). -fa Va) 
for fi € VŽ. 
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a) Prove that vı 8 ... Q va E V; ®... Q Vg (elements of Vi Q... ® Va of the 
form vı ® ... ® vg are called pure tensors). 

b) Is the map Vj x...x Vg > V1 @...@ Va sending (vı, ..., vq) tov) Q... Qva 
linear? Is it multilinear? 

c) Is every element of V; & ... ® Vy a pure tensor? 


3. For each 1 <i < d let (ei j)i<j<n; be a basis of V;. Let (ež isj<ni be the 
associated dual basis of V;*. 


a) Prove that for any f € Vi @... & Va we have 


ny Nd 
f= 5 zoi > FOr pa nea ek Ba. Bed ja: 


j=l jJa=1 


b) Prove that the family of pure tensors e1, 8...8ea4, jas where 1 < ji <n},..., 
1 < ja < na forms a basis of Vi ®@ ... Q Va. 
c) Prove that 


dim(V; Q... Q Vz) = dim V; -...-dim Vy. 


4. Prove that V; ® ... ® Vz has the following universal property: for any vector 
space W over F and any multilinear map f : Vj x... x Va —> W there is a 
unique linear map g : V; ® ...@ Va — W such that 


gM... va) = fMi,- va) 


forall v; € V, 1 <i<d. 

5. Prove that there is an isomorphism (V1 ® V2) 8 V3 > Vi ® V2 ® V3 sending 
vı ® V2) Q v3 to vy ® v2 ®@ v3 for all VLE Vi, v2 E V2, v3 E V3. 

6. Prove that there is an isomorphism V* @ V3" > (V; ® V2)*. 

7. Prove that there is an isomorphism V* ® V2 > Hom(V,, V2) sending fi ® v2 to 
the map vı + fi (v1)v2 for all fı € V* and v2 € V2. We recall that Hom(V;, V2) 
is the vector space of linear maps between V; and V2. 


7.2 Determinant of a Family of Vectors, of a Matrix, 
and of a Linear Transformation 


Let V be a vector space over F, of dimension n > 1. Let (vj, v2, ..., Vn) be an 
n-tuple of vectors in V forming a basis of V. The order of v1, v2,...,v, will be 
very important in the sequel, so one should not consider only the set {v1,..., Vn}, 


but the n-tuple (v1, ..., Vn). 
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Consider the dual basis vř, ..., vš of the dual space V*. Recall that v* is the 
linear form on V such that 


vi (xiv +... + XnVn) = Xi 


for all x1, ..., Xn € F. That is, v* (v) is the ith coordinate of v when expressed as a 
linear combination of v1, .. ., Vp- 
By Problem 7.2 the map 


f: V” >F, (x1... Xn) > Vi 1). v (Xn) 


is a n-linear form. By Problem 7.8, the map 
A(f):V" > F, AX) = Y eG) f Koa), Xom) 
OESn 
is an antisymmetric n-linear form. 


Definition 7.10. Let f be as above and let x1,..., Xn E€ V. We call 
A(f)(%1,...,%,) the determinant of x;,...,x, with respect to (v),...,v,) and 
denote it deto... v) (X1; <- +. Xn). 


Remark 7.11. 1) By definition we have 
deto, as wI pee Xn) = > e(o)vy (Xo(1))- : ve (Xon). (7.2) 
oESn 


In other words, if we write 


j=l 
for some scalars aj; € F (which we can always do, since v1, ..., Vy is a basis 
of V), then 
deto, ae va) (X15 aes Xn) = > e(o )aio() *...* Ano(n): 
OESn 
2) We claim that dete, ....v,) (V1; ---» Vn) = 1. Indeed, suppose that vř (vo(1)).-- 
V7 (Vo(n)) is a nonzero term appearing in the right-hand side of relation (7.2). 
Then v*(vsq)) is nonzero for all i € [1,n], which forces o(i) = i for all i. 


Thus the only nonzero term appearing in the right-hand side of (7.2) is the one 
corresponding to ø = id, which is clearly equal to 1. This proves the claim. 
3) The geometric interpretation of the determinant is as follows: consider F = R 


and let e1, @2,..., €„ be the canonical basis of R”. If x1, x2,..., x, are vectors in 
R”, we write det(x1, X2, . . ., Xn) instead of detye, ez... en) (X1, X2, . - -, Xn). We can 
associate to the vectors x1, X2, . . ., X„ the parallelepiped 


P(xX1, X2,- .., Xn) = {Ay xX, + 42X2 +... + anXnlai, -an € [0, 1]}. 
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For instance, if x; = e; for all i, then the associated parallelepiped is the 
hypercube [0, 1]”. The geometric interpretation of det(x1, x2, . . ., Xn) is given by 
the fundamental equality 


| det(x1, X2,...,Xn)| = vol(P(x1, X2,...,Xn)), 


the volume being taken here with respect to the Lebesgue measure on R” (this is 
the usual area/volume when n = 2/n = 3). 


Example 7.12. Consider the vector space V = F? over F and let e4, ez be the 


canonical basis of V. For any vectors x; = A and x, = B in V we have 


det(e; e2) (X1, X2) = ad — be. 


dete, e2) (L $ H) =4-2-3=-2. 


Here is the first big theorem concerning determinants: 


For instance 


Theorem 7.13. Let v,,...,V, be a basis of a vector space V over F. The 
determinant map dety,,...,) : Vi — F with respect to this basis is n-linear and 
alternating. 


Proof. Denote f = deto...) By Definition 7.10 and the discussion preceding it 
we know that f is n-linear and antisymmetric. If F € {Q, R, C}, then Problem 7.6 
shows that f is alternating. Let us give a proof which works for any field F (the 
reader interested only in fields such as R, C, Q may skip the following technical 
proof). 

Let x1,...,X, E€ V and suppose that they are not pairwise distinct, say x; = x; 
for some i < j. Let t = (i,j), a transposition and let A, be the set of even 
permutations in S,,, that is those permutations o for which e(o) = 1. Since e(to) = 
&(t)e(o) = —e(o) for all o € Sa, we deduce that S, = A, U tA, (disjoint union) 
and using formula (7.2) we can write 


Onon) = O viGew)--1, Ge) — A i Gem): v Greta). 


OoEAn o€A, 


We claim that x7o(k) = Xo(k) for all k and o € An, which clearly shows that 
F(%1,-.-,Xn) = 0. The claim is clear if o(k)  {i, j}, as then to(k) = o(k). 
Suppose that o(k) = i, then to(k) = j and the claim comes down to x; = xj, 
which holds by assumption. The argument being similar for o (k) = j, the theorem 
is proved. A 
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The second big theorem in the theory of determinants and multilinear maps is the 
following: 


Theorem 7.14. Let (v1,...,V,) be an n-tuple of vectors of V, forming a basis of V. 
If f : V” = F is any alternating n-linear form, then 


f = fOis... Vn) + deto, fet Vn) 


Proof. Let xı, ..., Xn be vectors in V and write 
Xj = Gi1V1 + Gj2V2 +... + GinVn 


for some scalars a;;. By part 1) of Remark 7.11 we have 


deto; ju vn) (X1, sey Xn) = 5 e(o )aio(1)- - Ano(n)- 
o€Sy 


On the other hand, repeatedly using the n-linearity of f, we can write 


n 
Fes. Xn) = fav +... + Ginn, Xa, -+ Xn) = Dau f (Vi, X2,- Xn) 


i=l 


n n 
= > aid; f (Vi, Vj, X3, si aa) =... = 5 äi -Anin f Vii š zdi Vi): 
i in=1 


ij=l Lise 


Now, since f is alternating, we have f(v;,,...,v;,) = 0 unless ij,...,i, are 
pairwise distinct, i.e., unless there is a permutation o € S, such that o(k) = i, 
for 1 < k <n. We conclude that 


f, e” Xn) = > Q\o(1)-+ Anon) f Voc); or) Vo(n))- 


o€Sy 


Since f is antisymmetric (by Problem 7.5 and the alternating property of f), we 
can further rewrite the last equality as 


fX.. Xn) = a €(O)A16(1)-- ‘anon f (V1, -< Vn) = 


o€Sy 


det; E vy) (X1, tes Xn) f V1, isos Vn), 


and the result follows. E 
Let us record two important consequences of Theorem 7.14 


Corollary 7.15. Let V be a vector space of dimension n > 1. The vector space of 
n-linear alternating forms f : V” — F has dimension 1. 
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Proof. Consider a basis v1,...,¥, of V and let f = dety,,.y,). By Theorem 7.13, 
the map f is an alternating n-linear form. By part 2) of Remark 7.11 we have 
f(vi,... Vn) = 1, thus f is nonzero. On the other hand, Theorem 7.14 shows 
that any alternating n-linear form differs by a scalar from dety,y,). The result 


follows. Oo 
Corollary 7.16. Given a basis vı, v2, ..., Vn of a vector space V over F, there is a 
unique n-linear alternating form f : V” — F such that f (vi, v2, ..., vn) = 1. This 


Proof. Uniqueness follows directly from Theorem 7.14. The existence has already 
been established during the proof of the last corollary. Oo 


Theorem 7.14 can also be used to establish a criterion to decide when a family 
of vectors forms a basis of a finite dimensional vector space: it all comes down to 
computing determinants, and we will see quite a few methods to compute them in 
the next sections (however, in practice it the method explained before Problem 4.34 
and based on row-reduction is the most efficient one). 


Corollary 7.17. Let V be a vector space of dimension n over F and let 


X1, X2, ..., Xn E€ V. The following assertions are equivalent: 
a) X1, X2, . . . Xn form a basis of V (or, equivalently, they are linearly independent). 
b) For any basis v1, v2, . . ., Vn we have 


deto; vo p vn) X1 XQ, 2065 Xn) # 0. 
c) There is a basis v1, v2, .. ., Vn such that 


detin ,v giris vn) (X1; X25. Xn) # 0. 


Proof. Suppose that a) holds and let y1,...,v, be a basis of V. By Theorem 7.14 
applied to f = det, 


dete, xn) (X1; e. Xn) = dete, xn) V1 pres Vn) deto, yena vm) (X1 1.5” Xn). 


By Remark 7.11 the left-hand side is 1, thus both factors in the right-hand side are 
nonzero, establishing b). It is clear that b) implies c), so assume that c) holds and 
let us prove a). Since dim V = n, it suffices to check that x1, x2, . . ., X„ are linearly 
independent. If this is not the case, we deduce from Theorems 7.14 and 7.9 that 
deto ,va,.. vn) X1; X2, - - -» Xn) = 0, a contradiction. Oo 


Problem 7.18. Let V be a finite dimensional F-vector space, let e),...,e, be a 
basis of V and let T : V — V be a linear transformation. Prove that for all 
Vi5--+. Vn E V we have 


X det(, .e ys Vi—1, T (Vi), Vitis ++ Vn) = Tr(T)- det(y,..., Vn), 


i=l 
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where all determinants are computed with respect to the basis e;,..., €„ and where 
Tr(T) is the trace of the matrix of T with respect to the basis e),..., €n. 


Solution. Consider the map 


Q: V” >F, ovi, neg Vn) = yo det(vı, «tag VISA T (vj), Vi+ls-- - Vn): 


i=l 


This map is a sum of n-linear maps, thus it is n-linear. Moreover, it is 
alternating. Indeed, assume for example that vı =v2. Then det(vı,..., vi—z1, T (vi), 
Vi+1»---, Vn)=0 fori > 2 and 


det(T (v1), v2, . - +, Vn) + det(v,, T (v2), .. ., Vn) = 
det(T (v1), vi, V3, .. ., Vn) + det(vı, T (v1), v3, --., Vn) = 0, 


since the determinant is antisymmetric. 
Since the space of n-linear alternating forms on V is one-dimensional, it follows 
that we can find a scalar œ € F such that 


Qvi, ... Vn) = Q det(vi, ..., Vn) 


for all vı, ..., vn. Choose vı = €1,..., Vn = en and let A = [a;j] be the matrix of 
T with respect to e1, ..., €n. Then the right-hand side equals a, while the left-hand 
side equals 


i=l 


n n 
) det(e),...,@;-1, J Ajij, ĉi+l --» En) = 
j=l 


n n n 
> > aji det(e1, ..., €i—1, €j, @i+1; -< €n) = X Gii, 
izj 


i=1 j=1 


the last equality being a consequence of the fact that the determinant map is 
alternating. Since }`;_; a;; = Tr(T), we conclude that œ = Tr(T) and we are 
done. E 


Remark 7.19. Tr(T) is actually independent of the choice of the basis e4, .. ., e, 
and it is called the trace of T. To prove the independence with respect to the choice 
of the basis, we need to prove that for all A € M,,(F) and all P € GL, (F) we have 


Tr(A) = Tr(PAP7). 


By a fundamental property of the trace map (which the reader can check without 
any difficulty) we have 


Tr(AB) = Tr(BA) 


7.2 Determinant of a Family of Vectors, of a Matrix, and of a Linear... 249 


for all matrices A, B € M, (F). Thus 
Tr(PAP~') = Tr((PA) P~') = Tr(P7!(PA)) = Tr((P7! P) A) = Tr(A). 
Consider a vector space V of dimension n > 1 and a linear transformation 
T:V—>V.If f : V” > F is ann-linear form, then one can easily check that 
e the map 
Tp: V” > F, (X1, Xn) > f(T (x1)... -, Tn) 


is also an n-linear form. 
e If f is alternating, then so is Ty. 


Using these observations, we will prove the following fundamental theorem: 


Theorem 7.20. Let V be a vector space of dimension n > | over F. For any linear 
transformation T : V — V there is a unique scalar det T € F such that 


F(T (x1), T (x2), ..., T (X%n)) = det T - f (x1, X2,..., Xn) (7.3) 


for all n-linear alternating forms f : V" —> F and all x1, X2,...,X%n E€ V. 


Proof. Fix a basis v1, v2, . . ., Vn of V and denote fo = deto, ,...v„,): By Theorem 7.13 
and Remark 7.11 fo is n-linear, alternating and we have fo(vi,..., vn) = 1. 

Since (x1, ..., Xn) > fo(T(x1),..., 7 (%n)) is n-linear and alternating, it must 
be a scalar multiple of fọ, thus we can find det T € F such that 


JAT (x1), ---, Tn) = det T + f(x, . . ., Xn) 


for all x1,...,Xn E€ V. Since any n-linear alternating form f is a scalar multiple 
of fo (Corollary 7.15), it follows that relation (7.3) holds for any such map f 
(since by definition of det T it holds for fọ), which establishes the existence part 
of the theorem. Uniqueness is much easier: if relation (7.3) holds for all f and all 
X1,...,X,, choosing f = fo and x; = v; for all i yields 


det T = JAT), pist T (va)), 


which clearly shows that det T is unique. Oo 


Definition 7.21. The scalar det T is called the determinant of the linear transfor- 
mation T. 


Note that the end of the proof of Theorem 7.20 gives an explicit formula 


det T = dety,...y,)(7(1),--.. Tn) (7.4) 


and this for any choice of the basis v1, . . ., Vn of V. In particular, the right-hand side 
is independent of the choice of the basis! 


250 7 Determinants 


Moreover, this allows us to express det T in terms of the matrix Ar of T with 
respect to the basis vj,..., Vn. Recall that Ar = [a;;] with 


n 


T (vi) = X ajiv;. 


j=l 


Following the proof of Theorem 7.14 (i.e., using the fact that deto, 
and alternating, thus antisymmetric), we obtain 


y,) 18 n-linear 


o€Sy 


The right-hand side is expressed purely in terms of the matrix Ar, which 
motivates the following: 


Definition 7.22. If A = [a;;] € M,(F), we define its determinant by 


det A = J` e(0)aio().- -dnown)- (7.5) 


o€S, 
We also write det A as 


411 412... Gin 
a31 A22 ... Arn 


Anı n2 ... Ann 


Problem 7.23. Prove that the determinant of a diagonal matrix is the product of 
diagonal entries of that matrix. In particular det J, = 1. 


Solution. Let A = [a;;] be a diagonal n x n matrix. Then 


det A = >» e(0)dio(1)- - -Ano(n)- 


oESn 


Consider a nonzero term in the previous sum, corresponding to a permutation o. We 
have digi) # O foralli € {1,2,...,n} and since A is diagonal, thus forces o (i) = i 
for alli € {1,2,...,n}. It follows that the only possibly nonzero term in the above 
sum is the one corresponding to the identity permutation, which equals a11.. ann, 
hence 


det A = a11.. .dyp, 


as desired. Oo 
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Let us come back to our original situation: we have a vector space V over F, a 
linear transformation T : V —> V, a basis vj,...,v, of V and the matrix Ar of T 
with respect to this basis. The previous discussion gives 


det T = det Ar (7.6) 
Note that the left-hand side is completely intrinsic to T by Theorem 7.20, in 
particular it is independent of the choice of vı, . . ., vn. On the other hand, the matrix 
Ar certainly depends on the choice of the basis v;,..., vn. The miracle is that while 


Ar depends on choices, its determinant does not! Let us glorify this observation, 
since this is a very important result: 


Theorem 7.24. If A € M,(F), then det A = det(PAP™!) for any invertible matrix 
P €GL,(F). In other words, similar matrices have the same determinant. 


We can turn this discussion upside down: start now with any matrix A € M,,(F) 
and let T : F” — F” be the linear transformation sending X € F” to AX. Then 
A is the matrix of T with respect to the canonical basis e;,...,e, of F” and the 
previous discussion shows that det A = det T. We deduce from Theorem 7.20 that 


FAGAN. AR) = det AF Oi Xn) 


for all n-linear alternating forms f : (F”")" > F. 


7.2.1 Problems for Practice 


1. Check that the general definition of the determinant of a matrix matches the 
definition of the determinant of a matrix A € M,(C) as seen in the chapter 
concerned with square matrices of order 2. 

2. Recall that a permutation matrix is a matrix A € M,,(R) having precisely one 
nonzero entry in each row and column, and this nonzero entry is equal to 1. 
Prove that the determinant of a permutation matrix is equal to | or —1. 

3. Let A = [a;;] € M,(C) and let B = [(—1)'t/a;;] € M,(C). Compare det A 
and det B. 

4. Generalize the previous problem as follows: let z be a complex number and let 
A = [aj] € M,(C) and B = [z't/a;;] € M,(C). Express det B in terms of 
det A and z. 

5. (The Wronskian) Let fi, fo,..., fn be real-valued maps on some open interval 
I of R. Assume that each of these maps is differentiable at least n — 1 
times. For x € J let W(fi,..., f,)(x) be the determinant of the matrix A = 
LEIP lisien where fS is the jth derivative of f; (with the convention 
that AR = f;). The map x => W( fi,- fa) (Œ) is called the Wronskian of 


1>- sfn: 
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a) Take n = 2 and f(x) = e”, fo(x) = e?* for two real numbers a,b. 
Compute the Wronskian of fi, f2. 
b) Prove that if fi,..., fn are linearly dependent, then 


Whi... fn) = 9. 


6. Consider a matrix-valued map A : J —> M,(R), A(t) = [aj;;(¢)], where aj; : 
I — R are differentiable maps on some open interval J of R. Let B, be the 
matrix obtained by replacing all entries in the kth row of A by their derivatives. 
Prove that for allt € 7 


det(A(t)) = X > det( By (t)). 


k=1 


In the next problems V is a vector space of dimension n > 1 over a field 
F € {R, C}. If p is a nonnegative integer, we let AP V* be the vector space of 
all p-linear alternating forms w : VP = V x...x V — F, with the convention 
that A°V = F. 

7. Prove that AP V* = 0 for p >n. 

8. Prove that if W is a finite dimensional vector space over F and if f : V > W 
is a linear map, then f induces a linear map f* : A7?>W* > A?V* defined by 


FP), ¥p) = OF WD), «+ fp). 


9. Prove that if g : W — Z is a linear map from W to another finite dimensional 
vector space Z over F, then 


(go f)* = f* og” 
as maps AP Z* => aP” V*. 


Ifw € AP V* andn € A1V*, we define the exterior product w ^ n of w and 
n as the map œ An: V?t4 — F defined by 


1 
zi = E(T) (Vo), say Volp) NOVa); tay Vo(p+q))+ 


(WAN) (1, sey Vp+q)= p'q 


“OESp+q 


10. Prove that œ An € A?TIV*. 
11. Prove that 


w^n = (1) nao. 
12. Check that for all œ € AP V*, œ € AIV * and œ € A" V* we have 


(an N 2) A 03 = 0 A (@ N w3). 
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We define w A @2 A... A @; by induction on r as follows: 


OA... AO, = (@1 A@2 A... A Or) A Oy. 


13. Check that for all @,...,@, E€ V* = A!V* and all v;,.. Vp € V we have 
(@ A... A @p)(V1,.-.,Vp) = det(a; (x; )). 


The right-hand side is by definition the determinant of the matrix A = 
[oi (xj)] € Mp(F). 
14. Prove that œ1,...,@p E V* = A!V* are linearly independent if and only if 


o ADA... Ap FO. 


15. Let %1, ...,@n be a basis of V. Prove that the family (w; ^.. A Wi) 1 <i <...<ip <n 
forms a basis of A? V* and deduce that 


: $ n n! 
dim A? V* = = — 
P p\(n — p)! 
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We reach now the heart of this chapter: establishing the main properties of the 
determinant map that was introduced in the previous section. We have fortunately 
developed all the necessary theory to be able to give clean proofs of all important 
properties of determinants. 

A first very important result is the homogeneity of the determinant map: if 
we multiply all entries of a matrix A € M,(F) by a scalar à € F, then the 
determinant gets multiplied by 1”. 


Proposition 7.25. We have det(AA) = A" det A for all A € M,(F) andall à € F. 
Proof. Write A = [a;;], then AA = [Aaj;;], hence by definition 
det(AA) = X` elo) Aai) ++ +++ Adnom) = 


o€S, 


ye E(O)A" Ajo, *---* Ano(ny = A” + det A, 


OESn 


as desired. E 
Problem 7.26. Prove that for any A € M, (C) we have 
det(A) = det A, 


where the entries of A are the complex conjugates of the entries of A. 
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Solution. Let A = [a;;], then A = [a@;;] and so 


det(A) = > &(0) a6 (1) *..e Anon) = 


o€S), 


5 &(0)@15()-- -Anom = ~~ Eelo )dio(1)- - -Ano(n) = det A. 


o€S, o€S, 


The main property of the determinant is its multiplicative character. 


Theorem 7.27. For all linear transformations T, Tz on a finite dimensional vector 
space V we have 


det(7| o T2) = det T; - det T>. 
Proof. Let v1,..., Vn be a basis of V. By Theorem 7.20 we have 
det(T; o T2) = det, ...y,)(T1(2(1)), --» T1(12n))) 
= det T; - deto... vn) (T201), -< T2Qn)).- 
Relation (7.4) shows that 


deto, vw) (T201), er T2(vn)) = det Tə. 


por 


Combining these two equalities yields the desired result. Oo 


Combining the previous theorem and relation (7.6) we obtain the following 
fundamental theorem, which would be quite a pain in the neck to prove directly 
from the defining relation (7.5). 


Theorem 7.28. For all matrices A, B € M,(F) we have 
det(AB) = det A - det B. 


Proof. Let V = F” and let Ti : V — V be the linear transformation sending 
X € V to AX. Define similarly T, replacing A by B. If S is a linear transformation 
on V, let Ags be the matrix of S with respect to the canonical basis of V = F”. 
Then A = Ar,, B = Ar, and AB = Ar,or,. The result follows directly from the 
previous theorem and relation (7.6). Oo 


Problem 7.29. An invertible matrix A € M,,(R) has the property that both A and 
A`! have integer entries. Prove that det A € {—1, 1}. 
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Solution. We have A - A7! = I,, so using the fact that the determinant is 
multiplicative and that det Z, = 1 (which follows straight from the definition of 
the determinant of a matrix), we obtain 


1 = det J, = det(A - A7!) = det A - det(A7!). 


Next, recalling the definition of the determinant of a matrix, we notice that if all 
entries of the matrix are integers, then the determinant of the matrix is an integer 
(since it is obtained by taking sums and differences of products of the entries of the 
matrix). Since A and AT! have by hypothesis integer entries, it follows that det A 
and det AT! are two integers, whose product equals 1. Thus det A is a divisor of 1 
and necessarily det A € {—1, 1}. Oo 


Remark 7.30. A much more remarkable result is the following kind of converse: 
suppose that A € M,,(R) is a matrix with integer entries. If detA € {-1, 1}, 
then A! has integer entries. This is fairly difficult to prove with the tools we have 
introduced so far! The reader might try to do the case n = 2, which is not so difficult. 


We can use the previous theorem and Corollary 7.17 to obtain a beautiful 
characterization of invertible matrices. The result is stunningly simple to state. 


Theorem 7.31. A matrix A E€ M,,(F) is invertible if and only if det A # 0. 


Proof. Suppose that A is invertible, so there is a matrix B € M,(F) such that 
AB = BA = I,. Taking the determinant yields det A - det B = 1, thus det A Æ 0. 


Conversely, suppose that det A # 0 and let e1,...,e, be the canonical basis of 
F”, and Ci,...,C, € F” the columns of A. Then det A = dete, ,...¢,)(C1,.--, Cn) 
is nonzero, thus by Corollary 7.17 the vectors C1, ..., Cn are linearly independent. 


This means that the linear map g : F” — F” sending X to AX is injective, and 
so invertible. Let y be its inverse and let B be the matrix of y in the canonical 
basis of F”. The equalities go y = y og = id yield AB = BA = H, thus A is 
invertible. E 


Problem 7.32. Let A and B be invertible n x n matrices with real entries, where n 
is an odd positive integer. Show that AB + BA is nonzero. 


Solution. Suppose that AB + BA = O,, thus AB = —BA. Taking the determinant, 
we deduce that 


det(AB) = (—1)" det BA = — det BA. 


On the other hand, det(AB) = det A det B = det(BA), thus the previous equality 
yields 2 det A det B = 0. This contradicts the hypothesis that A and B are invertible. 
Oo 


Problem 7.33. Let A and B be two square matrices with real coefficients. If A and 
B commute, prove that 


det(A? + B?) > 0. 
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Solution. Since A and B commute, we have 
A? + B? = (A+iB)(A—iB). 
Thus 
det(A? + B?) = det(A + iB) det(A — i B). 


But using Problem 7.26 we obtain 


det(A — iB) = det(A + iB) = det(A + iB), 
thus 
det(A? + B?) = | det(A + iB)|? > 0, 


as desired. O 


Problem 7.34. Letn be an odd integer and let A, B € M,,(R) be matrices such that 
A? + B? = O,,. Prove that AB — BA is not invertible. 


Solution. Consider the equality 
(A +iB)(A—iB) = 4 + B? +i(BA— AB) = i(BA — AB). 
Taking the determinant yields 
det(A + iB) det(A — iB) = i” det(BA — AB). 


Suppose that det(A B — BA) 4 0 and note that since A, B have real entries, we have 
by Problem 7.26 


det(A — iB) = det(A + iB) = det(A + iB) 
and so 
| det(A + iB)|? = i” det(BA — AB). 


Since det(AB — BA) is nonzero, we deduce that i” is real, contradicting the 
hypothesis that n is odd. Thus 


det(AB — BA) = 0 


and the result follows. O 
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Remark 7.35. An alternate solution goes as follows. Note that we have 
(A + iB)(A —iB) = A? + B? + i(BA— AB) = i(BA — AB) 

and 

(A—iB)(A+iB) = A* + B? —i(BA— AB) = —i(BA— AB). 
Since 

det((A +7B)(A —iB)) = det(A + iB) det(A —iB) = det((A —iB)(A + iB)), 

and n is odd, we conclude that 

i” det(BA — AB) = (—i)" det(BA — AB) = —i" det(BA — AB) 


and hence det(BA — AB) = 0. 


Problem 7.36. Let p,q be real numbers such that the equation x? + px +q = 0 
has no real solutions. Prove that if n is odd, then the equation X 24 pX +qln = On 
has no solution in M, (R). 


Solution. Suppose that X? + pX +qI, = O, for some X € M,(R). We can write 
this equation as 


P,\?_ pP -—4 
X+) = L. 
(x45 j 


Taking the determinant, we deduce that 


(5) = (det (x 4 aR >0. 


This is impossible, since by assumption p? < 4q and n is odd. Oo 


Another important property of the determinant of a matrix is its behavior with 
respect to the transpose operation. Recall that if A = [a;;] € M,(F), then its 
transpose ' A is the matrix defined by ‘A = [ai]. 


Theorem 7.37. For all matrices A € M,(F) we have 
det A = det(‘ A). 


Proof. By formula (7.5) applied to * A we have 


det(‘A) = D e(0)ao(1)1- - -Ao(n)n- 


OESn 
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For any permutation ø we have 


Ag(1)1-+ -Ao(nyn = A1o-1(1)- + Anon)» 


since Ajg—-1(j) = Ag(j)j With j = o`! (i) (and when i runs over {1,2,...,}, so 
does j). Using this relation and the observation that e(0~!) = e(0)~! = e(o), we 
obtain? 


det(' A) = $` eo Daia): Ano) = 


o€S, 


= > &(0)a1o(1). - Ano(n) = det A. 


o€Sy 


The result follows. O 


Problem 7.38. Let A be a skew-symmetric matrix (recall that this means that 
A+‘A = O,) of odd order with real or complex coefficients. Prove that det(A) = 0. 


Solution. By hypothesis we have ‘A = —A. Since det(A) = det(‘ A), it follows 
that 


det(A) = det(* A) = det(—A) = (—1)" det(A) = — det(A). 


Thus det(A) must be 0. Oo 
Problem 7.39. Let A be a matrix of odd order. Show that 


det(A — 'A) = 0. 
Solution. We have 
"A-—'A)='A-'('A)='A-A=—(A-—'A), 


thus the matrix A — 'A is skew-symmetric and its determinant must be 0 by 


Problem 7.38. O 
Problem 7.40. Let ai,...,an and b,,...,b, be complex numbers. Compute the 
determinant 
aı + bı by are by 
b2 doth... b2 
bn bn A te an + bn 


2Note that when o runs over S,,, so does o™!. 
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Solution. Let A be the matrix whose determinant we need to evaluate. We have 
det A = det(‘ A) 
and the columns of * A are the vectors aje} + biv, ...,dn@n + bnv, where e1,..., €n 


is the canonical basis of C” and v is the vector all of whose coordinates are equal 
to 1. We deduce that? 


det(’A) = det(aye, + biv, .. ., Anen + dnv). 
Using the fact that the determinant map is multilinear and alternating, we obtain 
n 
det(‘A)= det(a;e),.. sanen) +) det(a1e1, . . ., Qi—1€i—1, Div, Gi4 1:41, - - - Anen). 
i=l 
Indeed, note that det(x,,...,x,) = 0 if at least two of the vectors x1, ..., Xn are 
multiples of v. We conclude that 
n 
det(’A) = Q)...dyn + > 41.. .di—1biđi+1. . -an det(e1, .. ., €i—1, V, City. ss en). 
i=1 


Since v = e; +... + en and the determinant map is multilinear and alternating, 
we have 


det(e1,..., €i—1, V, @i41,--.5@n) = det(e1, ..., €n) = 1 


for all i. We conclude that 


det A = a1.. an + X bi- | [a 


i=1  kži 


E 


Recall that a matrix A € M,(F) is called upper-triangular if all entries of A 
below the main diagonal are 0, that is a;; = 0 whenever i > j. Similarly, A is 
called lower-triangular if all entries above the main diagonal are 0, that is a;; = 0 
whenever i < j. The result of the following computation is absolutely crucial: 
the determinant of an upper-triangular or lower-triangular square matrix is 
simply the product of the diagonal entries. One can hardly underestimate the 
power of this innocent-looking statement. 


3We simply write det instead of detye, ¢,)- 
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Theorem 7.41. If A = [aij] € M,(F) is upper-triangular or lower triangular, 
then 


n 
det A = ] J ai- 
i=l 


Proof. The argument being identical in the lower-triangular case, let us assume that 
A is upper-triangular. Consider a nonzero term €(0)d1(1). - Gno(n) appearing in the 
right-hand side of formula (7.4). Then each aj,(;) is nonzero and so necessarily 
i < o(i) for all i. But since )°_,i = )°7_, o(i) (as ø is a permutation), all 
the previous inequalities must be equalities. Thus o is the identity permutation and 
the corresponding term is 411. ..Ann. Since all other terms are 0, the theorem is 
proved. Oo 


Problem 7.42. For 1 < i,j < n we let a;; be the number of common positive 
divisors of i and j, and we let b;; = 1 if j divides i, and b;; = 0 otherwise. 


a) Prove that A = B- ' B, where A = [a;;] and B = [b;;]. 
b) What can you say about the shape of the matrix B? 
c) Compute det A. 


Solution. a) Let us fix i, j € {1,2,...,1} and compute, using the product rule 
(B-'B); = X bind jx- 
k=1 


Consider a nonzero term b;,bj;, in the previous sum. Since b; and bj, are 
nonzero, k must divide both i and j, that is k is a common positive divisor 
of i and j. Conversely, if k is a common positive divisor of i and j, then 
bik = bjk = 1. We deduce that the only nonzero terms in the sum are those 
corresponding to common positive divisors of i and j, and each such nonzero 
term equals 1. Thus (B - ‘B);; is the number of common positive divisors of 
i and j, which by definition of A is simply a;j. Since i, j were arbitrary, we 
deduce that A = B-'B. 

b) Ifi < j are between 1 and n, then certainly j cannot divide i and so bj; = 0. 
Thus 5;; = 0 whenever i < j, which means that B is lower-triangular. We can 


say a little more: since i divides i for alli € {1,2,...,n}, we have b;; = 1, thus 
all diagonal terms of B are equal to 1. 

c) Since the determinant is multiplicative and since det B = det('B), we have 
(using part a)) 


det A = det(B- * B) = (det B)’. 


We can now use part b) and the previous theorem to conclude that det B = 1 
and so 


det A = 1. oO 
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The next theorem (which can be very useful in practice) would also be quite 
painful to prove directly by manipulating the complicated expression defining the 
determinant of a matrix. The theory of alternating multilinear forms makes the proof 
completely transparent. 


Theorem 7.43 (Block-Determinants). Let A € M,(F) be a matrix given in block 


form 
B D 
A= , 
k | 


where B € M,(F), C € M, (F) (with p + q =n) and D € M,,(F). Then 
det A = det B - det C. 


Proof. Consider the map 


X D 
: (FP)P F, Xis... Xp) = P 
ESF p) ee 
where X € M,(F) is the matrix with columns X1, ..., Xp. The determinant map 


on (F”)” (with respect to the canonical basis of F”) being linear with respect to 
each variable, y is p-linear. Moreover, ¢ is alternating: if X; = X; for some i Æ j, 


; ae ‘ D i 
then columns 7 and j in the matrix | | are equal and so this matrix has 


Og,p C 
determinant 0. 
Now, applying Theorem 7.14 to the canonical basis of F”, we obtain 


X D I, D 
T |= aex- p | 
Oq.p C Oqp C 
for all X € M, (F). The same game played with the q-linear alternating form Y —> 
Ip D|. 
ld 
| Ors Y yields 
| lp y = dey | na, 
Oq.p Y ap la 
Thus 
det A = det B det C | Ip = det B det C, 
q.p *q 


the last equality being a consequence of the fact that the matrix | Ip r] is upper- 
q.p *q 
triangular with diagonal entries equal to 1, thus its determinant equals 1. Oo 


262 7 Determinants 


Problem 7.44. Let A € M,,(C) and let T : M, (C) —> M,,(C) be the map defined 
by T(X) = AX. Find the determinant of T. 


Solution. Let (E;;)i<i,j<n be the canonical basis of M,,(C). Note that 
T(E;j) = AE = X aki Ex;, 
k=1 


as shows a direct inspection of the product AE;;. We deduce that the matrix of 
T with respect to the basis E11, .. ., Eni, Fi2,..., En2,..., Ein, ..-; Enn is a block- 
diagonal matrix with n diagonal blocks equal to A. It follows from Theorem 7.43 
that 


det T = (det A)”. 


7.3.1 Problems for Practice 


1. A 5x 5 matrix A with real entries has determinant 2. Compute the determinant 
of 2A, —3A, A”, —A?, (AF. 

2. Prove that the determinant of an orthogonal matrix A € M,,(R) equals —1 or 1. 
We recall that A is orthogonal if A - ‘A = Ip. 

3. a) A matrix A € M,,(R) satisfies A? = I,. What are the possible values of 

det A? 

b) Answer the same question with R replaced by C. 
c) Answer the same question with R replaced by F3. 

4. Prove that for all A € M,,(R) we have 


det(A - 'A) > 0. 


5. If A = [a;;] € M, (C), define A* = [a;;] € M, (C). 


a) Express det(4*) in terms of det A. 
b) Prove that det(A - A*) > 0. 


6. Let T be a linear transformation on a finite dimensional vector space V. 
Suppose that V = V; ® V2 for some subspaces V1, V2 which are stable under 
T. Let Ti, T> be the restrictions of T to Vi, V2. Prove that 


det T = det T; - det 7>. 


7. The entries of a matrix A € M,,(R) are equal to —1 or 1. Prove that 2”~! divides 
det A. 


7.3 


10. 


11. 


12. 


13. 


14. 
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. Prove that for any matrix A € M,,(R) we have 


a) det(A? + In) > 0. 
b) det(A? + A + In) > 0. 


. Prove that if A, B € M, (R) are matrices which commute, then 


det(A? + AB + B®) > 0. 
Let A, B,C € M,(R) be pairwise commuting matrices. Prove that 
det(A? + B? + C? — AB — BC —CA) > 0. 


Hint: express Á? + B? + C? — AB — BC — CA simply in terms of the matrices 
X=A-BandY=B-C. 
Let A € M, (C) and consider the matrix 


a) Prove that det B = det(/, — A) - det(/, + A). Hint: start by proving the 


equality 
I, A] |n O [dn A 
Ah] [A I,—A*] LOn In}? 


b) If B is invertible, prove that J,, — A? is invertible and compute the inverse of 
B in terms of A and the inverse of I, — A?. 


Prove that for all matrices A, B € M,,(R) we have 


a 


i = | det(A + 7B)|’. 


Let A,B € M,,(R) be matrices such that A? + B? = AB and AB — BA is 
invertible. 


a) Let j =e >, so that j? + j +1 = 0. Check that 
(A+ jB)(A+ j7'B) = j(BA— AB). 
b) Prove that n is a multiple of 3. 


(Matrices differing by a rank one matrix) Let A € GL, (F) be an invertible 
matrix and v, w E€ M, (F) be vectors thought of as n x 1 matrices. 


a) Show that 


det(A — v-‘w) = det(A)(1 —‘wA7!y), 
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where we think of the 1 x 1 matrix ‘wA7!v as a scalar. 
Hint. One way to prove this formula is to justify the block matrix formula 


1—‘wATly'w] 1 Oj [i'w] [10 {i tw 
0 A| [Avn] |v Al bhh] lO A—v-tw] 


b) If furthermore ‘wA7~!v Æ 1, show that 


1 


— Av wart, 
1—'wA7!y 


(A-v-‘wyl=Al+ 


15. (Determinants of block matrices) Let X € M,(F) be a matrix given in block 


form 
AB 
X= ; 
E a 


where A € M,(F), B € M,,(F),C E€ M; p(F), D E€ M,(F), and p+q =n. 
If A is invertible, show that 


det(X) = det(A) det(D — CA7'B). 


Hint. Generalize the block matrix formula from the preceding problem. 

16. (Smith’s determinant) For 1 < i,j < n let x;; be the greatest common 
divisor of i and j. The goal of this problem is to compute det X, where 
X = [xijli<ij<n- 

Let be Euler’s totient function (i.e., ọ(1) = 1 and, for n > 2, ọ(n) is 
the number of positive integers less than n and relatively prime to n). Define 
yij = (j) if j divides i, and y;; = 0 otherwise. Also, let b;; = 1 if j divides 
i and 0 otherwise. 


a) Prove that X = Y ' B, where Y = [y;;] and B = [b;;]. 
b) Prove that 


det X = g(1)g(2)...e(n). 
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12 


Ifn = 2, then S2 contains only the permutations ( i J and p 1 


| hence we get 
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441 412| 
= 411422 — 412421, 


a21 422 


a formula which we have extensively used in the chapter devoted to square matrices 
of order 2. The value of the determinant of a matrix of order two may be remembered 
by the array 


x = 411422 — 412421 
a21 42214. 


If n = 3, then S3 contains six permutations: 
123 123 123 
123)’ 312)’ 231)’ 
123 123 123 
321)’ 132)’ 213)° 


The first three permutations are even and the last three are odd. In this case we get 


411 412 413 
a21 A22 A23 | = 411422433 + 413421432 + 412423431 
431 432 433 


— 413422431 — 411423432 — 412421433 


The value of the determinant of order three may be remembered using a particular 
scheme similar to that used for determinants of order two: 


+ 412423431 — 413422431 


— 411423432 — 412421433 


i.e., the first two columns of the determinant are repeated at its right, the products of 
the three elements along the arrows running downward and to the right are noted as 
well as the negative of the products of the three elements along the arrows running 
upward and to the right. The algebraic sum of these six products is the value of the 
determinant. 
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For example, applying this scheme we get 


1 4 0 1 4 
vX X. Z 
-1 2 1| -1 2=(Q)+8-0-0-0-(-4 =14 


/ XX N 
2 0 I 2 0 


Problem 7.45. Compute the determinant 


12 3 
2E S 
3 1 2 


Solution. Using the rule described above, we obtain 


12 3 
2-3 5 |=6+30+6+27-54+8=72. 
3 1 -2 


Problem 7.46. Consider the invertible matrix 


211 
A=/111 
112 


Find the determinant of the inverse of A. 


Solution. It is useless to compute AT! explicitly in order to solve the problem. 
Indeed, since A - A~! = I, we have det A - det(A~!) = 1 and so 


1 
det A` 


det(A~!) = 
It suffices therefore to compute det A. Now, using the previous rule, we obtain 
dettA=4+1+4+1-—-1-2-2=1. 
Thus 


det(A7') = 1. 
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No such easy rules exist for matrices of size at least 4. In practice, the following 
properties of the determinant allow computing a large quantity of determinants 
(the elements R1, R2,..., Rn below are the rows of the matrix whose determinant 
we are asked to compute or study). 


1. If every element of a row of a determinant of order n is multiplied by the scalar 
A, then the value of the determinant is multiplied by 1. 

2. If two rows of a determinant are interchanged, then the determinant gets 
multiplied by —1. More generally, we have the following formula where o is 
any permutation in S,, 


Roa) Ri 

Ro) Ro 
det . = e(o) det 

Ron) R, 


3. Adding a scalar multiple of a row of a determinant to another row does not 
change the value of the determinant: for j # k and à € F we have 


Rı 
Rı 
k R> 
det | R; +AR, | =det| . 
Rn 


n 


4. A very useful property is that the determinant of an upper (or lower) triangular 
matrix is simply the product of its diagonal entries. 


Note that the operations involved are the elementary row operations studied in 
Chap. 3. This gives us a practical way of computing determinants, known as the 
Gaussian elimination algorithm for determinants: start with the matrix A whose 
determinant we want to evaluate and perform Gaussian reduction in order to bring 
it to its reduced row-echelon form. This will require several elementary operations 
on the rows of the matrix A, which come down to multiplying A by a sequence of 
elementary matrices E1, ..., Ex on the left. Thus at the end we obtain 


E E2.. .Ek A = Aref, 
where A;e¢ is the reduced row-echelon form of A. Taking determinants gives 
det(£) - det(E2) » . . . - det Ey - det A = det A;er. 


Since Ayer is upper-triangular, its determinant is simply the product of its diagonal 
entries (in particular if some diagonal entry equals 0, then det A = 0). Also, the 
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previous rules 1-3 allow us to compute very easily each of the factors det £1, 
det E>,..., det Eg. We can neglect those matrices E; which correspond to transvec- 
tions, as their determinant is 1. Next, if Æ; corresponds to multiplication of a row by 
a scalar À, then det E; = A, and if E; corresponds to a permutation of two rows, then 
det E; = —1. Thus, in practice we simply follow the Gaussian reduction to bring 
the matrix to its reduced row-echelon form, and keep in mind to multiply at each 
step the value of the determinant by the corresponding constant, which depends on 
the operation performed as explained before. 


Remark 7.47. a) Note that since det(’A) = det A for all A € M,,(F), all previous 
properties referring to rows of a matrix (or determinant) still hold when the word 
row is replaced with the word column. 

b) For any particular problem an intelligent human being can probably do better 
than the naive Gaussian elimination. The most likely way is by being oppor- 
tunistic to produce more zeroes in the matrix with carefully placed row and/or 
column operations. Two more systematic ways are: 


e If there is some linear dependence among the columns (or rows) then the 
determinant vanishes, which gives an early exit to the algorithm. Note that 
this is the case if a column (or row) consists entirely of zeros, or if there are 
two equal columns or two equal rows. 

e Since we can easily compute the determinant of an upper-triangular matrix, 
we do not need to fully reduce, just get down to a triangular matrix. 


c) There are some extra rules one could exploit (they are however more useful in 
theoretical questions): 


e If a column is decomposed as the sum of two column vectors, then the 
determinant is the sum of the corresponding two determinants, i.e. 


det [ c; Ca.. ch H.. ca] 


= det | c; C2... Ch a.. Cn | +det[ ci E E i 


A similar statement applies to rows. 
e If A € M,(C), then the determinant of the conjugate matrix of A equals the 
conjugate of determinant of A, i.e. 


det A = det A. 


e For A, B € M, (F) we have 
det(A - B) = det A - det B. 
• IfA € M,(F) and) € F, then 


det(A A) = A” det A. 
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Problem 7.48. Prove that for all real numbers a, b,c we have 


1 1 1 
a b c =0. 
b+cc+aa+b 


Solution. Adding the second row to the third row yields 


1 1 1 1 1 1 
a b c = a b c 
b+cc+aa+b a+b+ca+b+ca+b+c 


Since the third row is proportional to the first row, this last determinant vanishes. L 


Problem 7.49. Let a,b,c be complex numbers. By computing the determinant of 
the matrix 


abc 
A=ļ]|bca 
cab 


in two different ways, prove that 


aè +b? + e — 3abe = (a +b + c)(a° +b? +c? —ab—be — ca). 


Solution. First, we can compute det A using the rule described in the beginning of 
this section. We end up with 


det A = —abc + abc + abc — a? — b? — 3 = —(a? + b? + c? — 3abc). 


On the other hand, we can add all columns to the first column and obtain 


a+b+cbc lbc 
det A = |a +b+cca|=(a+b+c)ilcal. 
a+b+cab lab 


The last determinant can be computed using the rule described in the beginning of 
the section. We obtain 


lbc 
1cal|= bc +ab + ca- ce? -— a?’ -— b’ = (a? +b? +c? —ab—be—ca). 
lab 


Thus 


det A = —(a +b + c)(a° +b? +c? —ab — bc — ca). 


Comparing the two expressions for det A yields the desired result. 0O 
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Problem 7.50. Prove that 


bı +c, c& +a a, +b, a, bi cy 
b2 + C2 C2 + a2 a2 + b2 | = 2 | a2 ba Co 
b3 + c3 c3 + a3 a3 + b3 a3 b3 c3 


for all real numbers a1, a2, a3, by, b2, b3, C1, C2, C3. 


Solution. Performing the indicated operations on the corresponding matrices, we 
have the following chain of equalities: 


bi +c ci +a a +b bi—a; cy +a,a, +b, 
Ci > C\-O 
by + C2 C2 + a a2 + ba | === |b — a2 c2 + a2 a2 + b2 
b3 + c3 c3 + a3 a3 + b3 b3 — a3 c3 + a3 a3 + b3 
—2a, cı +a, aı + bı aı cı +4; a; + bı 
Cı—>C1—C3 
=== | -22 C2 + a2 a2 + b2 | = —2 | a2 c2 + a2 a2 + b2 
—2a3 c3 + a3 a3 + b3 a3 c3 + a3 a3 + b3 
PEET aı cı bı a, bi cy 
Cs 
2 2a, & bg | = 2] an bo c2]. 
C3—>C3—C1 
d3 C3 b a3 b3 C3 
The result follows. Oo 


Remark 7.51. An alternate and shorter solution to the previous problem is to note 
that 


bi +c, ce, ta, a, +b, a, bi cy 011 
by + C2 C2 + a2 a2 + ba | = | a2 b2 c2 |-|101 
b3 + c3 c3 + a3 a3 + b3 a3 b3 c3 110 
Problem 7.52. Compute det A, where x1, ..., Xn are real numbers and 
1+x, X2 X3... Xn 
AS: xı Ltx2x3... Xn 
Xı Xo x3...1 +X) 


Solution. We start by adding all the other columns to the first column, and factoring 
out 1+ xı +... + Xn. We obtain 


L ox ue Xn 


det A = (1 + xı +... + Xn) 1 14x2... Xn 


1 x2 ... L+Xy 
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In this new determinant, from each row starting with the second we subtract the first 
row. We end up with 


1 x. Xn 
det A = (1 + xı +x. +... + Xn) 0 1...0 
0 0...1 


The last determinant is that of an upper-triangular matrix with diagonal entries equal 
to 1, thus it equals 1. We conclude that 


det A = 1 +x + xX. +... + Xn. 


Problem 7.53. Let A = [a;;] € M, (R) be the matrix defined by 


mes n+l,ifi= j 
he ifi £j. 
Compute det A. 
Solution. The matrix A can be written in the form 
n+l 1 ... 1 
1 ntl... 1 
A= 
1 1 ...ntl 


Adding all columns of A to the first column we obtain 


2n 1... 1 1 1... 1 

2nn+1... 1 ln+1... 1 
dt A=]. . , , |=4ml, 2. 4. 

2n 1 ...nt+l1 1 1 ...n+l 


The last determinant can be computed by subtracting the first column from each 
of the other columns, and noting that the resulting matrix is lower-triangular. 
We obtain 


TS i- etad 10...0 

ln+1... 1 1ln...0 
det A = 2n]. . 5 = 2n 

1 1 n+1 10 n 
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Remark 7.54. The previous two problems are special cases of Problem 7.40. 


Problem 7.55. Prove that for all real numbers a, b, c 


cosa sin? a cos 2a 
cos? b sin? b cos 2b | = 0. 


cos? c sin? c cos 2c 


Solution. We have 


2 


cos? a sin? 


a cos 2a cos? a — sin’ a sin? a cos 2a 


cos? b sin? b cos2b | = | cos? b — sin? b sin? b cos 2b 
cos? c sin? ? : 


2 


c cos2c cos? c — sin? c sin? c cos 2c 
cos 2a sin? a cos 2a 
= | cos 2b sin? b cos 2b | = 0. 


cos 2c sin? c cos2c 


The result follows. O 


Problem 7.56. Let A € M, (R). 


a) Show that if n? — n + 1 entries in A are equal to 0, then det(A) = 0. 
b) Show that one can choose A such that det A # 0 and A has n? — n + 1 equal 
entries. 


c) Show that if n? — n + 2 entries in A are equal, then det(A) = 0. 


Solution. a) We claim that the matrix A has a column consisting entirely of zeros, 
which implies det A = 0. Indeed, if each column of A has at most n — 1 zeros, 
then A has at most n(n — 1) zero entries in total. This contradicts the hypothesis. 
Consider the matrix A whose elements off the main diagonal are equal to 1 
and the diagonal entries are 1, 2,...,. Then n? —n + 1 entries are equal to 1, 
but det A Æ 0. Indeed, subtracting the first row from each subsequent row yields 
an upper-triangular matrix with nonzero diagonal entries, thus invertible. 

c) Ifn? —n + 2 entries in A are equal (say to some number a), then there are at 
most n — 2 entries of A that are not equal to a. Thus at most n — 2 columns of A 
contain an entry which is not equal to a. Said differently, at least 2 columns 
of A have all entries a. But then A has two equal columns and det(A) = 0. O 


b 


wm 


Problem 7.57 (The Vandermonde Determinant). Let a, a2,...,a, be complex 
numbers. Prove that the determinant of the matrix A = [a7 H <i,j<n (Where if 
necessary we interpret 0° as being 1) equals 


det(A) = I] (a; — a;i). 


l<i<j<n 
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Solution. Starting from the right-hand side and working left subtract a; times each 
column from the column to its right. This gives 


1 0 0 see 0 0 
1 (a2 — a1) (a2 — a1)az +++ (a2 — aya (az — aJa}? 
det A = |1 (a3 — a1) (a3 — a1)az -+> (a3 — aya (az — a1)a4 7? 


1 (an — a) (an = a1)an ae (an J a)an-3 (an — a\)an~? 
(a2 — a1) (a2 — ay) a2 +++ (a2 — a )a4~? (az — a1)a2 7? 


(a3 — a1) (a3 — a))a3 +++ (a3 — ay)a¥ (az — ay)aa 


(an — 41) (an — 1) an +++ (an — a)" > (an — ay)an? 


Factoring out a; — a, from row k gives 


la-a 


This last determinant is of the same form as the original matrix, but of a smaller 
size, hence we are done using an easy induction on n. Oo 


Problem 7.58 (The Cauchy Determinant). Let a),...,a@, and b, ..., bn be com- 
plex numbers such that a; + b; # 0 for 1 < i, j < n. Prove that the determinant of 
the matrix A = [51 equals 

ETO; 


Th<i<j<n(aj — 41)(b; — bi) 
TT ja1G@ + b;) 


Solution. Subtracting the last column from each of the first n— 1 columns and using 
the identity 


det A = 


1 1 E by — bj 
aj + b; ai + bn (ai + b;)(ai + bn) 
to factor a b, — b; out of the j-th column and a re out of the i-th row yields 
1 EN as 1 
ajith, ai+b2 ai+bn—1 
1 1 


1 
det A = (bn a, bı) ‘rs (bn pale bn 1) az+bı azr+b2 ea a2+bn—1 1 
(a1 + bn) +++ (an + bn) ae : : 
1 1 ree 1 1 
an+bı an+b an +bn—ı 
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Similarly, subtracting the last row from each of the first n — 1 rows in the matrix 
appearing in the last equality and pulling out common factors, we obtain 


1 1 1 
ai bi al bo aave aı+bn—1 
1 1 it 
n—-1 az+bı az+b2 "°° az+bn—=ı 
= I= Gn — 5:)(Gn — ai) f ‘ 
TON i nl : gi : 
[i= (ai + bn) Ta) Gn + b;) 1 1 1 0 
an—1 +b, an—1tbh2 °°" an—1+bn-1 
1 1 bane 1 1 
Hence 
1 1 1 
aı+bı aitb *'’  aytbn-1 
E E) fi aa 
ae i=1 Wn i n [i a2 1 a2 2: a27TOn-1 
det A = — A= i . i : : 
Į[;= (ai + bn) []j21 (an F bi) : . g 
1 1 1 
an—1 +b, an—1tbh2 ``’ an—1+bn—-1 


which allows us to conclude by induction on n, the last determinant being of the 
same form, but of smaller dimension. Oo 


Another useful tool for computing determinants is the Laplace expansion. 
Consider a matrix A € M, (F) with entries a;;. The minor of a;; is the determinant 
Mi; of the matrix obtained from A by deleting row i and column j. The cofactor 
of aj; is Ci = (1) + Mj;. 


Example 7.59. The minor of a23 in 


—2-1 0 
1 2 3 
4 0 -2 
is 
—2-1 
M 4 
a= 0 | 
and the cofactor of a23 is C3 = (—1) t? Ma3 = —4. 


The cofactors play a key role, thanks to the following theorem, which shows that 
the computation of a determinant of order n may be reduced to the computation of 
n determinants of order n — 1. If we use properties 1)—-11) of determinants and we 
create some zeros on the kth line, then we only need the cofactors corresponding to 
the nonzero elements of this line, i.e., combining these methods we can reduce the 
volume of computations. 
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Theorem 7.60 (Laplace Expansion). Let A = [a;;] E€ M, (F) be a matrix and let 
Ci, j be the cofactor of aij. 


a) (expansion with respect to column j ) For each j € {1,2,...,n} we have 
n 
det A = X aij Cy. 
i=l 
b) (expansion with respect to row i) For eachi € {1,2,...,n} we have 


det A = X aij Cy. 
j=l 


Proof. We will prove only part a), the argument being similar for part b) (alter- 
natively, this follows from a) using that the determinant of a matrix equals the 
determinant of its transpose). Fix j € {1,2,...,n}, let B = (e),...,e,) be the 
canonical basis of F” and let C),...,C, E€ F” be the columns of A, so that 
Ck = J`; Gixe; for all k. We deduce that 


det A = detg (Cj, zia Cn) = detg (C4, 8 Ope X ayei, Citi: za Ca) 


i=l 


n 
= X aijdetg (C1, bos is Cjan Cj ai : ny Gy): 


i=l 


It remains to see that X;; := detz (C1, ..., Cj—1, €i, Cj+1,- -- Cn) equals C;;, the 
cofactor of a;;. By a series of n — j column interchanges, we can put the jth 
column of the determinant X;; in the last position, and by a sequence of n — i 
row interchanges we can put the ith row in the last position. We end up with 


41i --- A1, j—1 41, j+1 +--+ Ain 0 


n—i+n—j 
Xij (—1) J 
Ani --- Qn j-l Qn j+i ++ Ann 0 


Qj)... Qj j—1 Gij+1 --- Gin 1 


The last determinant is precisely C;;, thanks to Theorem 7.43. The result follows, 
since (—1)"?*"-/ = (-1)'*/,” Oo 


Example 7.61. Expanding with respect to the first row, we obtain 


411 412 413 
a22 423 


a21 422 423 | = 411 


432 433 
431 432 433 
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Problem 7.62. Let 


11 1 1 
pon 11 1 -3 
2 2 —-2-2 
3—1 —1 -1 
Compute 
(a) det(A) (b) det(A‘ A). (c) det(A + A). (d) det(A7!). 


Solution. (a) Subtracting the second row from the first and expanding with respect 
to the first row yields 


A ; : ake: 

det A = | Ao 2 2l. 
i ean ee) a 
3-1-1-1 


In the new determinant, subtract twice the first row from the second one, and 
add the first row to the last one. We obtain 


11 1 
det A = —4 |0 0 —4]. 
40 0 


Expanding with respect to the last row yields 


det A = —4-4- (—4) = 64. 


(b) Since the determinant map is multiplicative and det A = det(’ A), we obtain 


det(A‘ A) = det A - det(“ A) = (det A)? = 64° = 4096. 


(c) We have 

det(A + A) = det(2A) = 2*- det(A) = 16-64 = 1024. 
(d) Finally, 
1 1 


det(A) 64° 


det(A7!) = 
o 


Problem 7.63. Let A = [a;j] € M, (R) be a matrix with nonnegative entries such 
that the sum of the entries in each row does not exceed 1. Prove that | det A| < 1. 


Solution. We will prove the result by induction on n, the case n = 1 being clear. 
Assume that the result holds for n — 1 and let A be a matrix as in the statement of 
the problem. For 1 <i < n let A; be the matrix obtained by deleting the first row 
and column i from A. Then 
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det A = ) °(-1)'*!aj; det Aj, 


i=1 


and by the inductive hypothesis applied to each A; we have | det A;| < 1. We deduce 
that 


| det A| < X > aii det 4;| < Yo aii <1 
izi 


i=l 


and the result follows. O 


We have already seen that a matrix A € M,,(F) is invertible if and only if 
det A Æ 0. It turns out that we can actually compute the inverse of the matrix A by 
computing certain determinants. Before doing that, let us introduce a fundamental 
definition: 


Definition 7.64. Let A € M,(F) be a square matrix with entries in F. The 
adjugate matrix adj(A) is the matrix whose (i, j )-entry is the cofactor Cj; of aji. 
Thus adj(A) is the transpose of the matrix whose (i, j )-entry is the cofactor Cj; 
of a ij: 


We have the fundamental result: 


Theorem 7.65. If A € M,,(F) has nonzero determinant, then 


1 
-1 < 
= ——adj(A). 
det A adj(A) 
Proof. It suffices to prove that A-adj(A) = det A- I„. Using the multiplication rule, 
this comes down to checking that 


ye Cx, i,j = det A: d;x 


j=l 


for all 1 < i,k <n, where 6;, equals 1 if i = k and 0 otherwise. 

If k = i, this follows by Laplace expansion of det A with respect to the ith row, 
so suppose that k Æ i and consider the matrix A’ obtained from A by replacing its 
kth row with a copy of the ith row, so that rows i and k in A’ coincide, forcing 
det A’ = 0. Using the Laplace expansion in A’ with respect to the kth row and 
taking into account that the cofactors involved in the expression do not change when 
going from A to A’ (as only the kth row of A is modified), we obtain 


0 = det A’ = Yai Cr, j 
j=l 


and the result follows. Oo 
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The previous theorem does not give a practical way of computing the inverse of 
a matrix (this involves computing too many determinants), but it is very important 
from a theoretical point of view: for instance, it says that the entries of A~! are 
rational functions of the entries of A (in particular they are continuous functions 
of the entries of A if A has real or complex coefficients). The practical way 
of computing the inverse of a matrix has already been presented in the chapter 
concerning linear systems and operations on matrices, so we will not repeat the 
discussion here. 


7.4.1 Problems for Practice 


1. Let x be a real number. Compute in two different ways the determinant 


x1i 
1x1 
lix 


2. Let a,b,c be real numbers. Compute the determinant 


a—b-c 2a 2a 
2b b-—c-a 2b 
2c 2c c-a-—b 


3. Let x be a real number. Compute the determinant 


cosx O sinx 
0 1 0 
— sin x 0 cos x 


4. Let a, b,c be real numbers. Compute the determinant 


a+lb+1lc+l1 
b+ca+ca+b 
1 1 1 


5. Let a,b,c be real numbers. Find a necessary and sufficient condition for the 
vanishing of the following determinant 


(a+b? &@ b? 
a (a+c œe 
pe eh GE 
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Oyz 
6. Let x,y,z be real numbers. By considering the matrices | z x 0| and 
yOx 
Ozy 
y x 0 |, compute the determinant 
z0x 
y+ xy zx 
xy +x% yz 
xe yz x?+y? 


7. Let x, y,z be real numbers. Compute 


1 cosx sin x 
1 cos(x + y) sin(x + y) |. 
1 cos(x +z) sin(x + z) 


8. Compute det(A), where A is the n x n matrix 


-l 11... 1 
i= EE Oe eee | 
hot 1...—1 


9. Let a,b,c, d be real numbers and consider the matrices 


abcad 11 1 1 
badc 1 1-1-1 
a cdab)’ ee 1-1-1 1 
dcba 1-1 1 —1 


a) Compute det B. 
b) By considering the matrix AB, compute det A. 


10. Let a be a real number. Prove that for n > 3 we have D, = aDn-1 — Dn-2, 


where 
aloOO......... 0 
Tide Wy Oeste 0 
Or L cae Jl ied ban 0 
D,=)|)0 0 1 a......... 0 
0 0 0 0 l a 1 
0000 01a 


is the determinant of ann x n matrix. 
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11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 
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Leta,b,c € R and x = a? + b? + c?, y = ab + bc + ca. Prove that 


1 x x x 
lx+yx+y 2x 
1 2x x+yx+y 
Ix+y 2x x+y 


= (a2 + b? +c? —3abc)’. 


Compute det A in each of the following cases: 


(a) aij = min(i, j), fori, j =1,...,n. 
(b) aij = max(i, j), fori, j = 1,...,n. 
(c) aij = |i — j|, fori, j =1,...,n. 


Letn > 2 and let A = [aij] € M, (R) be the matrix defined by 


ie 0,ifi = j 
Y (11, if i Fj. 
a) Compute det A. 
b) Prove that 
2—n 1 
-1 
= In + — A 
n-—l —1 


Letn > 3 and let A be the n x n matrix whose (i, j )-entry is aj; = cos 
fori, j € [1, n]. Find det(/,, + A). 
Let A be a matrix of order 3. 


2n(it+j) 
n 


(a) If all the entries in A are 1 or —1 show that det(A) must be even integer and 
determine the largest possible value of det(A). 

(b) If all the entries in A are 1 or 0, determine the largest possible value of 
det(A). 


Letn > 2 and let x1, ..., Xn be real numbers. Compute the determinant of the 
matrix whose entries are sin(x; + x;) for 1 <i, j <n. 
Let A € M,,(R) be the matrix whose (i, j )-entry is aj; = rae Prove that 


(1!2!...,2!)4 
det A = ———____, 
(n!)71!2!.. .(2n)! 

Hint: use the Cauchy determinant. 

Let V be the space of polynomials with real coefficients whose degree does 
not exceed n. Compute the determinant of the linear transformation T sending 
PeVtoP+P’. 


7.4 


19. 


20. 


21. 


22. 


23. 


24. 


25. 
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Prove that any matrix A € M,,(R) with determinant 1 is a product of matrices 
of the form J, + AE;;, withi A j € [l,n] and À € R. Hint: use elementary 
operations on rows and columns. 

Let A be an invertible matrix with integer coefficients. Prove that A~! has 
integer entries if and only if det A € {—1, 1}. 

Let A,B € M,(R) be matrices with integer entries such that A, A + B,..., 
A + 2nB are invertible and their inverses have integer entries. Prove that A + 
(2n + 1)B has the same property. Hint: prove first the existence of a polynomial 
P with integer coefficients such that P(x) = det(A + xB) forall x € R. 

Let A,B € M,(C) be matrices which commute. We want to prove that the 
adjugate matrices adj(A) and adj(B) also commute. 


a) Prove the desired result when A and B are invertible. 
b) By considering the matrices A + tn and B + in for k — œ, prove the 
desired result in all cases. 


Let A E€ M, (C), with n > 2. Prove that 
det(adj(A)) = (det A)""!. 


Hint: start by proving the result when A is invertible, then in order to prove the 
general case consider the matrices A + tn for k + oo. 
(Dodgson condensation) Consider a 3 x 3 matrix 


411 412 413 
A = | az an a23 
431 432 433 


View this matrix as being composed of four 2 x 2 matrices (overlapping at a22). 
Form a 2 x 2 matrix by taking the determinants of these four 2 x 2 matrices 


NW NE 
_ Show that 
Re | enone 


NW NE 
= A) det(A). 
p A an det(A) 
(Dodgson condensation, continued) Choose your favorite 4x 4 matrix A = [a;;] 
with a22, 423, 432, 433, and 422433 — 423432 all nonzero. 


a) Compute det(A). 

b) View A as being composed of a 3 x 3 array of overlapping 2 x 2 matrices 
and compute the determinants of these 9 matrices. Write them in a 3 x 3 
matrix B. View this 3 x 3 matrix as being composed of four overlapping 
2 x 2 matrices. For each compute the determinant and divide by the entry of 
A the four had in common. Write the results in a 2 x 2 matrix C. Take the 
determinant of C and divide it by the central entry of B (the one common to 
the four determinants that make up C). Compare your result to the result of 
part (a). 
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Remark. This method of computing a determinant, due to Charles Dodgson, 
a.k.a. Lewis Carroll, extends to higher dimensions. It is best visualized if you 
imagine filling a pyramidal array of numbers. We start with an (n + 1) x (n + 1) 
array of all ones and we lay the n x n matrix whose determinant we want in 
the layer above, with each entry of A sitting between four of the ones. At each 
stage, we fill the next layer by computing the determinant of the 2 x 2 matrix 
formed by the four touching cells in the layer below and dividing by the entry 
two layer down directly below the cell. (Thus at the first stage there is a division 
by 1 that we neglected to mention above.) When you are done, the entry at 
the top of the pyramid will be the determinant. (There is a slight complication 
here. Following this procedure naively might result in dividing by zero. This is 
fixable, but makes the algorithm less pretty.) 


7.5 The Vandermonde Determinant 


If A € M,,(F), by definition 


det A = > €(0)A15(1)A20(2)- - Ano(n) 
oES;, 


is a polynomial expression in the entries of the matrix A. This suggests using 
properties of polynomials (such as degree, finiteness of the number of roots...) for 
studying determinants. This is a very fruitful idea and we will sketch in this section 
how it works. 

We start with an absolutely fundamental computation, that of Vandermonde 
determinants. These determinants play a crucial role in almost all aspects of 
mathematics. We have already given a proof of the next theorem in Problem 7.57. 
Here we give a different proof. 


Theorem 7.66. Let F be a field and let x\,...,X, E€ F. Then 


=j 
1 Xi ... x} 
1 x x47! 
= I] (x; = Xi). 
bales aA aE Di EPE 
1 Xn xr! 


Proof. We will prove the statement by induction on n, the cases n = 1 and n = 2 
being left to the reader. Assume that the result holds for n — 1 (and for any choice of 


Alys 


.-, Xn—1) and let x1, ..., Xn € F. If two of these elements are equal, the result is 


clear: the determinant we want to compute has two equal columns, so must vanish. 
So assume that x1, . . ., Xn are pairwise distinct and consider the polynomial 
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n—-1 
1 X1 -Xi 
1 X2 " ae 


P(X) = 
L Anitai 
1 X ... xX"! 


Expanding with respect to the last row, we see that 


1 X1 y 
n—2 
PO ax | PO geet 
Lox pte ees eee 
for some an-2, . . ., ao E F. Thus by the inductive hypothesis the leading coefficient 


of P is Thei<j<n—1 Oj —x;) 40. 
Let i € [1,2 — 1]. Taking X = x;, we obtain a determinant with two equal rows, 


which must therefore vanish. It follows that P (x1) =... = P(x,-1) = 0. Since P 
has degree n — 1, leading coefficient aie eee Ce — x;) and vanishes at n — 1 
distinct points x1, . . ., Xn—1, we deduce that 


n—1 


PXy= [[ œ- [A -x». 


I<i<j<n—l1 i=1 
Plugging in X = x, yields the desired result. Oo 
Xit gom 
, x2... xno! 
Remark 7.67. We call the determinant z the Vandermonde deter- 
ee EET heme 
minant associated with x,,...,x,. It follows from the previous theorem that 
the Vandermonde determinant associated with x,,...,x, is nonzero if and only 
if X,,...,X, are pairwise distinct. Vandermonde determinants are ubiquitous in 
mathematics and are closely related to the following fundamental problem: “for 
distinct complex numbers x;,...,X, and arbitrary complex numbers },,..., bn, 


find a polynomial P(X) of degree at most n — 1 such that P(x;) = b; Written out 
as a linear system for the coefficients a; of P yields the equation Va = b, where b is 
the column vector whose coordinates are the b;’s and V is the Vandermonde matrix 
associated with x,,...,x, (thus det V is the Vandermonde determinant associated 
with x1, . . ., Xn). The fact that the Vandermonde determinant is nonzero is equivalent 
to this problem having a unique solution. The unique solution of this problem 
(known as Lagrange interpolation) is given by 


P(x)= >a T] = 
J 


i=1 ji” 
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Problem 7.68. Let a,b,c be nonzero real numbers. Prove that 


a? b? c? 
c? a? b? | = (a? — be) (b? — ca) (c? — ab). 
ac ab bc 


Solution. Dividing all entries of the first column by a’, all entries of the second 
column by b?, and all entries of the third column by c°, we obtain 


a? b? c? 1 1 1 
a | = (abo? (O GP GY 
ac ab be c g b 


1 1 1 
=-(abey| £ ¢ 2 


We recognize a Vandermonde determinant associated with A ae b, thus we can 
further write 


a b? e? b b 
c2 a b2 = —(abc) _¢ i Ta (Ẹ-5). 
deabbe c a c b b a 
We have 
b c  œ@-—ab 
c a ac 


and similar identities obtained by permuting a, b, c. We conclude that 


apa 
c? a? b? | = (a? — be) (b? — ca) (c? — ab). 
ac ab bc 


Problem 7.69. Let F be a field and let x;,...,x, E F. Compute 


n—2 „n 
1 xy... xp" xy 
T kreek XY 
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Solution. Write 
P(X) = (X — x1)... (X — Xn) = X” Hay X"! +... +a X +a 


for some scalars do,...,d, € F, with an-ı = —(xı +... + Xn). Next, add to 
the last column the first column multiplied by ao, the second column multiplied by 
ay,..., the n — 1th column multiplied by a„—2. The value of the determinant does 
not change, and since 


-2 -1 
x? + anx} +... + 4o = an1 x)” 
we deduce that 
1 x Pama gn 1 x xn? yn! 
1 x ag" xe 1 Xx x272 xn! 
= —4n-1 
n—2 „n n—2 „n—l 
1 Xn pe 1 Xn Ae A 
n 
=O x) [[ œ- x», 
i=l l<i<j<n 
the last equality being a consequence of Theorem 7.66. E 


Remark 7.70. An alternate solution is to remark that the desired determinant is the 
coefficient of X”! in the Vandermonde determinant 


Lxy s+ x" 
‘l= [] œ- [E - 2x). 
1 xn +++ Xa I<i<j<n k=1 
1X- X" 
Problem 7.71. Let Po,..., Pa—ı be monic polynomials with complex coefficients 


such that deg P; = i forO <i <n — 1 (thus Py = 1). If x1,...,x, E C, compute 


Po(x1) P, (x1) wee Py-1(%1) 
Po(x2) Pi(x2) ... Pn—i(%2) l 


Po(Xn) Pi (Xn) -.. Pa—1(%n) 
Solution. Let A be the matrix whose determinant we want to compute and let us 


write 


P(X) = XÍ + cia XT +... Cio 
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for some complex numbers c;;. The matrix A is then equal to 


2 n—-1 
1 Xi Xp... XY 1 C1,.0 C2,.0 ---Cn.0 
2 n—-1 


1X2 X54... XR O 1 coy... Cn 


ie EE 00 0 ...1 


Since the second matrix is upper-triangular with diagonal entries equal to 1, its 
determinant equals 1. Using Theorem 7.66, we deduce that 


Po(x1) Pi(xi) ... Pa (x1) 
Po(X2) Pi(x2) ... Pa—1(%2)} _ I =x) 
= ) 


l<i<j<n 


Po(Xn) Pi(Xn) ... Pn—1 (Xn) 


Problem 7.72. For 0 < k < n compute det A, where 


1% oF 3 o (a + FF 
ok af 4E.. (n + 2)* 


(n+ 1DE (n +2) 1 +3)*... Qn +1) 
Solution. Consider the matrix 


14 3k a, nÉ (x + 1)* 
Dk 3k AE Late tn 


(n+ D% (n +2 (n+ 3) - Qn) TEP 1k 


obtained from A by modifying its last column. Then p(x) = det(A,) is a 
polynomial in the variable x, whose degree is at most k < n. Indeed, expanding 
the determinant of A, with respect to the last column shows that p(x) is a linear 
combination of (x + 1)*,...,(x + n + 1)*, each of which has degree k. 

Next, observe that p vanishes at 0, 1,...,7— 1, since when x € {0,1,... n — 1} 
the matrix A, has two equal columns, thus det(A,) = 0. Since deg p < n and p 
has at least n distinct roots, it follows that p is the zero polynomial. In particular 
p(n) = 0, hence the determinant to be evaluated is 0. O 
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1. Given real numbers a, b, c, compute the determinant 


b+c a+c a+b 
bD +e a? +e a Hb). 
ETE ET E 


2. Let a,b,c be real numbers. Compute 


a+b aba +b? 
b+cbe b? +e). 
c +a ca c +a? 


3. Let z,...,Z, be pairwise distinct complex numbers. Let f; : R —> C be the 
map x +> e“*. Prove that fi,..., fn are linearly independent over C. Hint: if 
ay fi +... +@n fa = 0, take successive derivatives of this relation and evaluate 
atx = 0. 

4. a) Prove that for any positive integer n there is a polynomial T, of degree n 

such that 


T, (cos x) = cosnx 


for all real numbers x. This polynomial T, is called the nth Chebyshev 
polynomial. For instance, 7;(X) = X, (X) = 2X? — 1. 
b) Let x1,..., Xn be real numbers. Using part a), compute the determinant 


1 cos(x) cos(2x,) ... cos((m — 1)x1) 
1 cos(xz) cos(2x2) ... cos((m — 1)x2) 


i se) oso) - cos((n — 1)xn) 


5. Let X1,...,Xn,V1,---, Yn be complex numbers and let k € [0, n — 1]. Compute 
the determinant of the n x n matrix whose (i, j)-entry is (x; + yj )‘. Hint: 
use the binomial formula and write the matrix of the product of two simpler 
matrices, of Vandermonde type. 

6. Let do, a1, ...,@,—1 be complex numbers and consider the matrix 


Ay a, a2... An—-|j 


A= an-ı AQ 41 ... An—2 


a, a2 a3... ag 


obtained by cyclic permutations of the first row. 
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a) Letz = e™ and consider the matrix B = [26 -DU-)] <j <n. Compute the 
matrix AB. 
b) Deduce that 


n n—-1 
det = J] (Staci 
k=1 \j=0 


. Consider the “curve” C = {(1,t,07,...,¢77!)|t € C} in C”, where n is a 


positive integer. Prove that any n pairwise distinct points of C form a basis 
of C”. 


. Using Vandermonde’s determinant, prove that we cannot find finitely many 


maps fi, gj : R — R such that 


e? = Y fogo) 


i=l 


forall x, y € R. 


. Let z1, z2, ...,n be complex numbers such that 
atote.tmHytegt... FR =. =Gtut...F¢7=0, 
Prove that zi = z2 = ... = Zn = 0. 


Prove that there exists an infinite set of points 
ses daa Paas Pa1,. Pis Pb Poy Pips 


in the plane with the following property: for any three distinct integers a, b, and 
c, points P4, Pp, and P. are collinear if and only if a + b + c = 2014. Hint: let 


P, be the point with coordinates (x, x°), where x = n — lt, 


7.6 Linear Systems and Determinants 


In 


this section we will use determinants to make a more refined study of linear 


systems. Before doing that, we will show that the computation of the rank of a 
matrix A E€ M,,,(F) can be reduced to the computation of a certain number of 
determinants. This will be very important for applications to linear systems, but is 
not very useful in practice, since it is more practical to compute the rank of a matrix 


by 


computing its reduced echelon form (by definition of this form, the rank of A is 


simply the number of pivots). 


by 


Let A € Mm n(F) be a matrix. Recall that a sub-matrix of A is a matrix obtained 
deleting a certain number of rows and columns of A. 
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Theorem 7.73. Let A E€ Mm n(F) be a matrix of rank r. Then 


a) There is an invertible r x r sub-matrix in A. 
b) Fork > r, any k x k sub-matrix of A is not invertible. 


In other words, the rank of A is the largest size of an invertible sub-matrix of A. 


Proof. Let A = [a;;] and let C4, . . ., C, be the columns of A, so that 
r = dimSpan(C),...,C,). 


Let d be the largest size of an invertible sub-matrix of A. We will prove separately 
the inequalities d > r andr > d. 

We start by proving the inequality r > d. Let B be an invertible d xd sub-matrix 
of A. Permuting the rows and columns of A (which does not change its rank), we 
may assume that B consists in the first d rows and columns of A. Then C),..., Ca 
are linearly independent (as any nontrivial linear relation between C),..., Ca would 
induce a nontrivial relation between the columns of B, contradicting the fact that B 
is invertible). But then 


r = dim Span(C),...,C,) > dim Span(C),..., C4) = d. 


Let us prove now that r < d. By definition of r, we know that we can find r 
columns of A which form a basis of the space generated by the columns of A. Let B 
be the m x r matrix obtained by deleting all other columns of A except for these r. 
Then B has rank r. But then ' B also has rank r (because a matrix and its transpose 
have the same rank), thus the space generated by the rows of B has dimension r. 
In particular, we can find r rows of B which are linearly independent. The sub- 
matrix obtained from B by deleting all other rows except for these r is an invertible 
r xr sub-matrix of A, thud d > r. o 


Problem 7.74. Let vı,...,vp € F” be vectors and let A € M,,p(F) be the matrix 
whose columns are v1, ..., Vp. Prove that vı, .. ., vp are linearly independent if and 
only if A has a p x p invertible sub-matrix. 


Solution. v;,...,v, are linearly independent if and only if they form a basis of 
Span(vı, . . ., Vp) or equivalently if dim Span(vı, . . ., vp) = p. Finally, this is further 
equivalent to rank(A) = p. The result follows then directly from the previous 
theorem. E 


Problem 7.75. Consider the vectors 
vı = (1,x,0,1), v2 = (0,1,2,1), v3 = (1,1,1,1) € Rf. 


Prove that for any choice of x € R* the vectors v1, v2, v3 are linearly independent. 
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Solution. The matrix whose columns are v1, v2, v3 is 


101 
x1il 
021 
111 


By Problem 7.74 v1, v2, v3 are linearly independent if and only if A has a 3 x 3 
invertible sub-matrix. Such a matrix is obtained by deleting one row of A. Deleting 
the second row yields a sub-matrix whose determinant is 


101 
021|=-l1, 
111 
thus the corresponding sub-matrix is invertible and the result follows. Oo 


Thanks to the previous results, we can make a detailed study of linear systems. 
Consider the linear system 


a1 xX1+ ai2x2 +.. + AinXn = bı 
az Xı+ a2X2 +.. + AmnXn = b2 


Ami Xi + Am2X2 +.. + amnXn = bm 


by 
with A = [aj] € Mmn(F), b = bo e F” a given vector and unknowns 
bn 
xı 
X1,...X,. Let X = a2 and let Cj,...,C, be the columns of A. Then the 
- 


system can be written as 
AX =b or xC +...+x,C, =b. 


The first fundamental theorem of linear systems is the following: 


Theorem 7.76 (Rouché-Capelli). Consider the linear system above and let 
[A,b] € Mm n+1ı(F) be the matrix obtained by adding a rightmost column to A, 
equal to b. Then 
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a) The system is consistent’ if and only if rank(A) = rank[A, b]. 

b) Assume that the system is consistent and let Xo be a solution of it. Let Sy be the 
set of solutions of the associated homogeneous system.’ Then the set of solutions 
of the original system is {Xo + X|X € Sy} and Sy is a vector space of dimension 
n —rank(A) over F. 


Proof. a) The system being equivalent to b = x1C, +...+ XnChn, it is consistent 
if and only if b is a linear combination of C,,...,C,, which is equivalent 
to b € Span(Cı,..., Cn). This is further equivalent to Span(C),...,C,) = 
Span(C,,...,C,, b) and finally 


dim Span(C),...,C,) = dim Span(C),..., Cn, b). 


By definition, the left-hand side equals rank(A) and the right-hand side equals 
rank[A, b]. The result follows. 

b) By Proposition 3.2 we know that the set of solutions of the system is {Xo + 
X|X € Sj}. It remains to prove that Sn is of dimension n — rank(A). But the 
corresponding homogeneous system can be written AX = 0, thus its set of 
solutions is the kernel of the map T sending X e F” to AX e F”. By the 
rank-nullity theorem we deduce that 


dim Sy = n — dim Im(T) = n — rank(A) 


and the theorem is proved. Oo 


Let us take for simplicity F = R or F = C (the same argument will apply to 
any infinite field). It follows from the previous theorem that we have the following 
possibilities: 


e the system has no solution. This happens precisely when A and [A, b] do not have 
the same rank. 

e the system has exactly one solution, which happens if and only if A has rank 
exactly n, or equivalently its columns are linearly independent. 

e the system has more than 1 solution, and then it has infinitely many solutions. 
More precisely, the solutions depend on n — rank(A) parameters. 


Here is an important consequence of the previous results: 


Theorem 7.77. Let A E M,,,(F) and let F, be a field containing F. Consider the 
linear system AX = 0. If it has a nontrivial solution in F;', then it has a nontrivial 
solution in F”. 


Proof. Since the system has nontrivial solutions in Fj’, A has rank r < n seen as 
element of Mm.n(Fı). But Theorem 7.73 shows that the rank of A seen as element 
of Minn(Fi) or Mm n(F) is the same, thus using again the previous discussion we 
deduce that the system has nontrivial solutions in F”. E 


Recall that this simply means that the system has at least one solution. 


5This is the system AX = 0, i.e., it has the same unknowns, but b is equal to 0. 
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To make a deeper study of the linear system AX = b, assume that it is consistent 
and that A has rank r (therefore [A, b] also has rank r). By Theorem 7.73 the matrix 
A has an invertible r x r sub-matrix. By permuting the equations and the unknowns 
of the system, we may assume that the sub-matrix consisting in the first r rows and 
columns of A is invertible. Then x,,...,x, are called the principal unknowns and 
the first r equations of the system are called the principal equations. All other 
equations can be deduced from the first r, so separating the principal and non- 
principal unknowns yields the equivalent system 


aXit aix? +.. +4airX, = bi — Ay rg Xp — ---— AinXn 
aznxı+ 42X2 +.. A op Xp = bz — Aa p41 Xpt1 — - - - — An Xn 
arı Xı + aár2X2 +.. + arrXr = b, — Arr41Xr+1 — +++ ArnXn 
This a Cramer system, that is the number of unknowns (which are x),..., x7) 


equals the number of equations and the matrix of the system (which is [aj;]1<i,;<r) 
is invertible. This kind of system has a unique solution, which can be expressed in 
terms of some determinants, as the following theorem shows: 


Theorem 7.78. Let A = [dij]1<i,j<n be an invertible matrix in M,(F), let b = 


bı 
bo € F” bea given vector and consider the system AX = b with the unknowns 
bn 
X1,...,X,. Then the system has a unique solution 
X = A'b 
and we have for alli € [1,n] 
Ai 
EE. 
A 


where A = det A and A; is the determinant of the matrix obtained from A by 
replacing the ith column with the vector b. 


Proof. It is clear that the system AX = b is equivalent to X = A~'D and so it has 
a unique solution. To prove the second part, let e1, ..., @, be the canonical basis of 
F” and write det instead of dete; ,...e,)- If C1, . . ., Cn are the columns of A, then by 
definition 


Ai = det(Cj, satay C;-1, b, Ci+1, iias Can). 


Since AX = b, we have xC +... + x,C, = b. Since det is multilinear and 
alternating, we obtain 
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A; = det( C1552 Ci—1, C tC i Ciis.. 
j=l 


” Cr) = 


Xx det(C1, . . ., Ci—1, Cj, Cig, ..., Cn) = Xj det(C),..., Ci-1, Ci, Ci41,.--, Cn) 


j=l 
= x;detA = x,A. 


The result follows. Oo 


Finally, we want to give another criterion for consistency. Recall that 
A € Mm. n(F) is the matrix of the system, that we assume rank(A) = r and 
(by permuting the unknowns and the equations) that the r x r sub-matrix of A 
consisting in the first r rows and columns of A is invertible. 


Theorem 7.79. Under the previous hypotheses, the system AX = b e F” is 
consistent if and only if for all k € |r + 1,m] we have 


411 412... Air bi 
Gat a22 ... Arr by 
A, = oie = 0. 
Ar Apr... Ary Dy 
Aki ak2 ... Akr bk 
Proof. If the system is consistent, then b is a linear combination of Ci, ..., Cn, 


so the last column of the matrix defining A, is a linear combination of the other 
columns, thus A; = 0 and for all k € [r + 1, m]. 

Conversely, assume that all determinants A; are 0. Note that it makes sense to 
define A, for k < r by the same formula, and it is clear that we still have A, = 0 
for k < r (as the corresponding matrix has two equal rows). Expanding A, with 
respect to its last row and denoting A;+11,..., A-+1,, the corresponding cofactors 
(which are independent of k) we obtain 


Ar+1,14k1 tees + Apgar dk + det(aij)i<i j<rbk = 0, 
for all k and so 
Aygi iC) +... + ApgirC, + det(ajj)<i,j<rb = 0. 
Since det(aj;)1<;,;<r # 0 by assumption, this shows that b € Span(C),..., C,) and 


so b is a linear combination of the columns of A, which means that the system is 
consistent. E 
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Let us see a few examples of explicit resolutions of linear systems using the 
above ideas. Actually for the first one the method of reduced echelon form is much 
more practical than the following. We strongly suggest the reader to compare the 
two methods. 


Problem 7.80. a) Solve in real numbers the system 
X1 + 3x2 — 5x3 = 4 
xi + 4x — 8x3 = 5 
—3x, — 7x2 + 9x3 = 6 
b) Solve the system 
Xi + 3x2 — 5x3 = 4 
xi + 4x2 — 8x3 = 7 
—3x, — 7x2 + 9x3 = —6 


Solution. a) The matrix of the system is 


1 3 —5 
A=|1 4 -8 
-3 -7 9 


and one easily computes det A = 0. Thus the system is not a Cramer system. 
Looking at the sub-matrix of A consisting in the first two rows and columns, we 
see that it is invertible. It follows that A has rank 2. The system is consistent if 
and only if 


1 3 —54 
rank} 1 4 —85]|=2. 
—3 -7 96 


This means that all matrices obtained from this matrix by deleting one column 
have determinant 0. But one easily checks that the matrix obtained by deleting 
the second column is invertible, thus the system is not consistent and thus it has 
no solution. 

b) The matrix of the system is the same. The system is consistent if and only 


1 3 -5 4 
if all matrices obtained by deleting one column from | 1 4 —8 7 | have 
—3-7 9 —6 


determinant 0. One easily checks that this is the case, thus the system will have 
infinitely many solutions. The principal unknowns are x1, x2 and the system is 
equivalent to 


xi + 3x. = 5x3 + 4, x1 + 4x: = 8x3 + 7 
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and can be solved using Cramer’s formulae or directly. One finds 
xX = 3x3 + 3, xı = —4x3— 5. 


We conclude that the solutions of the system are given by (—4t — 5,3t + 3,t) 
witht € R. o 


Problem 7.81. Let a, b,c be given real numbers. Solve the linear system 


(b+c)x+by+cz=1 
ax+(c+a)y+cz=1 
ax+by+(a+b)z=1 


Solution. The matrix of the system is 


b+c b c 
A= a atc c 
a b a+b 


and a brutal computation left to the reader shows that 
det A = 4abc. 


We consider therefore two cases. 
If abc # 0 the system is a Cramer system with a unique solution given by 
Cramer’s formulae 


1 b c b+c1 c b+c b 1 
la+c c a 1 c a a+cl 
1 b a+b a la+b a b 1 
X5 gabe > gabe °> T gabe 


In order to compute x explicitly, we subtract b times the first column from the 
second one, and c times the first column from the third one, ending up with 


1 0 0 

la+c—b 0 

1 0 a+b-c| (a+tc—b)(a+b—c) 
4abc 7 4abc 


x= 


and we similarly obtain 


_ (b+c-a(b+a-c) _ (a+c-b)(b+c-a) 
AT 4abc a 4abc ; 
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In the second case we have abc = 0, that is A is not invertible. The system is 
consistent if and only if 


b+c b c 1l 
rank(4) = rank| a a+c c 1 
a b a+b1 


While one can follow the discussion given in this chapter, it is much easier to deal 
with this case as follows: say without loss of generality that a = 0. The system 
becomes 


(b+c)x+by+cz=1 
c(y+z)=1 
b(y+z)=1 


It is clear from the second and third equations that if the system is consistent, then 
necessarily b = c and b is nonzero. So if b # c or bc = 0, then the system has 
no solution in this case. Assume therefore that b = c is nonzero. The system is 
equivalent to 


bQx+y+z=1 
b(y+z)=1 


making it clear that x = 0 and y + z = i: In this case the solutions of the system 
are given by (0, y, i —y)withy E R. E 


Problem 7.82. Letn be an integer greater than 1. Solve the linear system 


xı+ Xe Xn =1 
Xy+ 2x. +..4 nx, =0 


x+ 2” lx +.. +n” lx =0 


Solution. The matrix of the system is 


1 Ties. 1 
2 3... n=l n 


A= 
|e am mr (me aca (aa 


and it is invertible as its determinant is a Vandermonde determinant. 
Therefore the system is a Cramer system and we have 


1 1... 1 1 1 wee, l 
D- 2a BED OO Es me A 


1 271... aD 0 (i + D1... n”! 
det A ` 


Xi = 
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The numerator of the previous fraction is the Vandermonde determinant attached 


to 1,2,..., į —1,0,7 + 1,...,n, while the denominator is the Vandermonde deter- 
minant attached to 1,2,...,i —1,i,i + 1,...,”. Recalling that the Vandermonde 
determinant attached to x1, . . ., Xn equals [],<;- j<n (x; — xi) and canceling similar 


factors in the numerator and denominator, we end up with 


T= - j) Maia k =p (i —1)!n! 
EG- 7) Maik -) (i — lil —i)! 


_ ¢_1y-1 n! —_ni-i[” 
=A e () 


xi = =)! C) 
i 


Remark 7.83. An alternate solution goes as follows: write P(T) = $ `%—; Xk TÉ. 
Then the first equation reads P(1) = 1 and the rest read P®(1) = O0 for 
k = 1,...,n — 1. Since we also have P (0) = 0 by construction, we see that the 
unique solution is P(T) = 1 — (1 — T)” from which we read off the coefficients 
x, = CD (g). 


Problem 7.84. Let ai, ...,an,b1,...,bn be pairwise distinct complex numbers 
such that a; + b; A 0 for alli, j € [1,n]. Find all complex numbers x1,..., Xn 
such that for all i € [1, n] we have 


Xi = 


and so fori € [1,7] 


E 


Solution. The determinant of the associated matrix is a Cauchy determinant and 
equals (by Problem 7.58) 


Th<icjen(@i —a;)(bj — b 
IE jen @ + b;) 


Thus the system has at most one solution. One could in principle argue as in the 
previous problem to find this solution, but there is a much more elegant (and very 
useful) technique that we prefer to present. Consider the rational function 


det A = ey 
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The system is equivalent to F (a1) =... = F(a,) = 1. Write 
pS) z 
F(X) = JOT Q(X) = (X + bı)... (X + bn) 


for some polynomial P, and notice that deg P < n — 1. The system becomes 
P(aj) = Q(a;) for 1 < j < n. Since Q — P is a monic polynomial of degree n 
vanishing at 41, . . ., an and since 41, . . ., An are pairwise distinct, we deduce that the 
system is equivalent to 


Q(X) — P(X) = | [(X - a). 
k=1 


The conclusion is that x1, . . ., Xn is a solution of the system if and only if 


2 x Mka + bo -MaX a) 
j=l X+ bj Ik- X T bk) l 


In order to find each x;, we multiply the previous relation by X + b; and then make 
X tend to —b;. We deduce that 


lim Tai + be) — Tear (X — ak 


nat e= (ar + bj) 
Teej Ge =b;) 


? 2n 


xj = 


It follows that the system has a unique solution, given by 


Tai (ax + 5s) 
Trz; br = 5) 


Xj = ey? 


7.6.1 Problems for Practice 


1. Let A € M,(C) and let B = adj(A) be the adjugate matrix. 


a) Prove that if A is invertible, then so is B. 
b) Prove that if A has rank n — 1, then B has rank 1. 
c) Prove that if A has rank at most n — 2, then B = O,. 


2. Using the previous result, find all matrices A € M,,(C) which are equal to their 
adjugate matrix. Hint: the case n = 2 is special. 
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3. Let a,b be complex numbers and consider the matrix A € M,,(C) whose 
diagonal entries are all equal to a, and such that all other entries of A are equal 
to b. 


a) Compute det A. 
b) Find the rank of A (you will need to distinguish several cases, according to 
the values of a and b). 


4. Find real numbers a, b, c such that for all polynomials P with real coefficients 
and whose degree does not exceed 2 we have 


1 
f P(x)dx = aP (0) + bP) +cP(1). 
0 


5. Given real numbers a, b, c, u, v, w, solve the linear system 
ax—by =u 
by —cz=v 


CZ —aAX = WwW 


6. Given a real number a, solve the linear system 


y AE 
tea + ia + u =l 
x 4 — 
zea + aaa + aaa = | 
x 4 — 
ata t3 ta +3 eT 

7. Let Sq be the linear system 
x-2y+z=1 


3x+2y—2z=2 
2x— y +az=3 


a) Find all real numbers a for which the system has no solution. 
b) Find all real numbers a for which the system has a unique solution. 
c) Find all real numbers a for which the system has infinitely many solutions. 


8. Given real numbers a, b, c,d, solve the linear system 
x+y+z=1 


ax+by+cz=d 
ax+b’y+ez= d? 
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9. Given real numbers a, b,c, d, œ, solve the linear system 


(+a)x+y+z+t=a 
x+(l+a)y+z+t=b) 
x+y+(+a)z+t=c 
x+ytz+(U+a)t=d 


10. Find the necessary and sufficient condition satisfied by real numbers a, b, c for 
the system 


x-a(y+z)=0 
y—b(x +2) =0 
z—c(x+y)=0 


to have a nontrivial solution. 
11. Let a, b,c be pairwise distinct real numbers. Solve the system 


x+ayt+a@z=a3 
x + by + b?z = b? 
xteytez=c3 


12. Let a, b be complex numbers. Solve the linear system 


ax+y+zt+t=1 
x+ay+z+t=b 
x+y+az+t =b? 
x+y+z+at =b. 


Chapter 8 
Polynomial Expressions of Linear 
Transformations and Matrices 


Abstract From a theoretical point of view, this chapter is the heart of the book. 
It uses essentially all results established before to prove a great deal of surprising 
results concerning matrices. This chapter makes heavy use of basic properties of 
polynomials which are used to study the eigenvalues and eigenvectors of matrices. 


Keywords Minimal polynomial • Characteristic polynomial * Eigenvalue 
e Eigenvectors 


From a theoretical point of view, we reach now the heart of the book. In this chapter 
we will use everything we have developed so far to study linear maps and matrices. 
To each matrix (or linear transformation of a finite dimensional vector space) we 
will associate two polynomials, the minimal and the characteristic polynomial. They 
are not enough to characterize the matrix up to similarity, but they give lots of 
nontrivial information about the matrix. We also associate a collection of scalars 
called eigenvalues of the matrix (if the field of scalars is C, the eigenvalues are 
simply the roots of the characteristic polynomial) and a collection of subspaces 
indexed by the eigenvalues and called eigenspaces. An in-depth study of these 
objects yields many deep theorems and properties of matrices. 

In this chapter we will make heavy use of basic properties of polynomials. We 
recalled them in the appendix concerning algebraic prerequisites, and we strongly 
advise the reader to make sure that he is familiar with these properties before starting 
reading this chapter. We fix a field F (the reader will not loose anything assuming 
that F is either R or C). 


8.1 Some Basic Constructions 


Let V be a vector space over a field F, and let T : V — V be a linear 
transformation. We define a sequence (T");> of linear transformations of V by 
the rule 


T? =id, Tt! =T oT 
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for i > 0, where id denotes the identity map (sending every vector v to v). In other 
words, T’ is the ith iterate of T, for instance 


T*(v) = TTT) 


forallve V. 
If P = aọ+a1ıX +...+a,X" € F[X] we define a linear transformation P (T) 
of V by 
P(T) := aoT° + aT! +... +4@nT". 


The following result follows easily by unwinding definitions. We will use it 
constantly from now on, without further reference, thus the reader may want to 
take a break and check that he can actually prove it. 


Proposition 8.1. If Pi, P2 € F[X] and T is a linear transformation of V, then 
a) P\(T) + P(T) = (Pi + Po)(T). 
b) P\(T) o P(T) = (Pi Py)(T). 

We warn the reader that we do not have P(7,) + P(T)) = P(T, + T2) and 
P(T,) 0 P(T) = P(T, o T>) in general. For instance, take P(X) = X? and 
T; = Tə = id, then 

P(T) + P(T) = 2id £ 4id = (Ti + Tr)’. 
We invite the reader to find a counterexample for the equality P (Ti) o P(T2) = 
P(Ti o D). 
Definition 8.2. The F-algebra generated by the linear transformation T is the set 


F(T] = {P(T), P € F[X]}. 


The following result follows directly from the previous proposition: 


Proposition 8.3. For all x,y € F|T] and c € F we have x + cy € F|T] and 
xoy € F|[T]. Thus F[T] is a subspace of the space of linear transformations on V, 
which is stable by composition of linear transformations. 


Actually, the reader can easily check that F [T] is the smallest subspace of the 
space of linear transformations on V which contains id, T and is closed under 
composition of linear transformations. 

All previous constructions and results have analogues for matrices. Namely, if 
A € Mą,(F) is a square matrix of order n with coefficients in F, we have the 
sequence (AŻ )i>o of successive powers of A, and we define for P = dy + a1 X + 
.-- +a, X” € F[X] 


P(A) := aoln +41 A+...+4,A". 


We have P(A) - O(A) = (PQ)(A) for all polynomials P, Q and all matrices A. 
The algebra generated by A is defined by 


F[A] = {P (4), P € F[X]}. 


It is a subspace of M„ (F) which is stable under multiplication of matrices. 
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Remark 8.4. If A is the matrix of some linear transformation T of V in some basis 
of V, then P(A) is the matrix of P(T) in that basis. 


Problem 8.5. a) Let A, B € M,,(F) be matrices, with B invertible. Prove that for 
any P € F[X] we have 


P(BAB"') = BP(A)B". 


b) Prove that if A,B € M,,(F) are similar matrices, then P(A) and P(B) are 
similar matrices for all P € F[X]. 


Solution. a) Suppose first that P(X) = X* for some k > 1, we need to prove that 
(BAB!) = BA* B-!, But using that B~!B = J, several times, we obtain 


(BAB!) = BAB™'BAB™...BAB™ 
= BA’ B! BAB™!...BAB™! =... = BA‘ B™!. 
In general, write P(X) = ao +a1ıX +... + a, X*, then 
k 


k 
P(BAB™) = ) \a;(BAB™')' = Y `a; BA‘ Bo 
=0 


i=0 
k 
= B() a; A)B! = BP(A)B 
i=0 


and the problem is solved. 
b) Write A = CBC™! for some invertible matrix C. Then by part a) 


P(A) = P(CBC™) = CP(B)C™!, 


thus P(A) and P(B) are similar. Oo 


8.1.1 Problems for Practice 


1. Prove Proposition 8.1. 
2. Let 


-1 1 1 
A=]1-1 1 
—2 2 -3 


Compute P(A), where P(X) = X? — X +1. 
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3. Let a, b,c be real numbers and let 


0 —b c 
A=]|a 0 -c 
—a b 0 


Compute P(A), where 
P(X) = X(X* +ab + bc + ca). 


4. Prove that the matrix 


201 
A= | -40-2 
—40 -2 


is nilpotent. 

5. Let A € M,,(F) be a symmetric matrix. Prove that for all P € F[X] the matrix 
P(A) is symmetric. 

6. Let A € M,(F) be a diagonal matrix. Prove that for all P € F[X] the matrix 
P(A) is diagonal. 

7. Let A € M,(F) be an upper-triangular matrix. Prove that for all P € F[X] the 
matrix P(A) is upper-triangular. 

8. Let V be the vector space of smooth functions f : R > Rand let T : V > V 
be the linear transformation sending f € V to its derivative f’. Can we find a 
nonzero polynomial P € R[X] such that P(T) = 0? 


8.2 The Minimal Polynomial of a Linear 
Transformation or Matrix 


Let V be a finite dimensional vector space over F, say of dimension n > 1. We 
will be concerned with the following problem: given a linear transformation T 
of V, describe the polynomials P € F[X] for which P(T) = 0. Note that we 
can also ask the dual question: given a polynomial P € F[X], describe the linear 
transformations T for which P(T) = 0. This is more difficult to answer, and solving 
this problem actually requires the resolution of the first problem. 

So let us start with a linear transformation T of V and consider the set 


I(T) = {P € F[X], P(T) = 0}. 


A key observation is that I(T) is not reduced to {0}. Indeed, the space of 
linear transformations on V has dimension n°, thus the linear transformations 
id, 7,T’,..., T”? cannot be linearly independent. Thus we can find ao,...,a,2 not 
all O such that 

aoid +aT +...+4,T” =0 


and then ag + a;X +... + aX” is a nonzero element of I(T). 
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Theorem 8.6. There is a unique monic (nonzero) polynomial ur € I(T) such that 
I(T) is the set of multiples of ur in F(X], i.e. 


I(T) = wr: F(X]. 


Proof. Proposition 8.1 implies that /(T) is a subspace of F [X] and that PQ € I(T) 
whenever P € I(T) and Q € F[X]. Indeed, 


(PQ)(T) = P(T)° Q(T) = 00 Q(T) = 0. 


The discussion preceding Theorem 8.6 shows that /(T) 4 0. Let P be a nonzero 
polynomial of smallest degree in /(T). Dividing P by its leading coefficient (the 
new polynomial is still in Z(T) and has the same degree as P), we may assume that 
P is monic. By the first paragraph, all multiples of P belong to /(7). Conversely, 
let S be an element of I(T) and write S = QP + R with O,R €e F[X] and 
deg R < deg P. Note that R = S — OP € I(T) since I(T) is a subspace of F[X] 
and S, OP € I(T). If R ¥ 0, then since deg R < deg P we obtain a contradiction 
with the minimality of P. Thus R = 0 and P divides S. It follows that /(7) is 
precisely the set of multiples of P and so we can take ur = P. 

Finally, we need to prove that ur is unique. If S had the same properties, then 
S would be a multiple of ur and ur would be a multiple of S. Since they are both 
monic, they must be equal. Oo 


Definition 8.7. The polynomial ur is called the minimal polynomial of T. 


Due to its importance, let us stress again the properties of the minimal 
polynomial ju7: 


e itis monic and satisfies ur (T) = 0. 
e For any polynomial P € F[X], we have P(T) = 0 if and only if ur 
divides P. 


The whole theory developed above applies verbatim to matrices: if 
A € M, (F), there is a unique monic polynomial u4 € F[X] with the following 
properties: 


e (A) = O, and 
e If P € F[X], then P(A) = O, if and only if u4 divides P. 


Remark 8.8. If P is a polynomial and A is a matrix (or a linear transformation) 
satisfying P(A) = O,, we will sometimes say that P kills A or that A is killed 
by P. Thus the polynomials killing A are precisely the multiples of the minimal 
polynomial of A. 


The discussion preceding Theorem 8.6 shows that we can find a nonzero 
polynomial P of degree not exceeding n? such that P(T) = 0. Since ur divides P, 
it follows that deg ur < n?. This bound is fairly weak and the goal of the next 
sections is to introduce a second polynomial canonically associated with T, the 
characteristic polynomial of T. This will be monic of degree n and will also vanish 
when evaluated at T. This will yield the inequality deg ur < n, which is essentially 
optimal. 
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Let us give a few examples of computations of minimal polynomials: 


Example 8.9. Let F be a field. All matrices below are supposed to have entries 
in F. 


a) The minimal polynomial of the matrix O, is clearly po, = X. More generally, 
the minimal polynomial of the scalar matrix cI, is X — c. 

b) Consider some elements d,,...,d, € F and a diagonal matrix A = [a;;], with 
aii = di. The elements d,,...,d, are not necessarily pairwise distinct, so let 
us assume that d;,,..., di, is the largest collection of pairwise distinct elements 
among d,...,d,. Note that for any polynomial Q € F[X] the matrix Q(A) 
is simply the diagonal matrix with diagonal entries Q(d,),..., Q(d,). Thus 
Q(A) = O, if and only if O(d;) = 0 for 1 < i < n. This happens if and 
only if O(d;,) =... = O(d;,) = 0. Since dj,,..., di, are pairwise distinct, this 
is further equivalent to (X — d;,)...(X — di,) | Q. Thus the minimal polynomial 
of A is 


p(X) = (X = dj)... (X — di). 


In particular, we see that d;,...,d, are pairwise distinct if and only if u4 has 

degree n, in which case u4 = []/_,(X — di). 

Suppose that F = R and that a matrix A € M,(F) satisfies A? + I, = O,. What 

is the minimal polynomial u4 of A? For sure it divides X? + 1, since X? + 1 

vanishes at A. The only monic nonconstant divisor of X? + 1 in R[X] is X? + 1 

itself, thus necessarily u4 = X? + 1. 

d) With the tools introduced so far it is not easy at all to compute the minimal 
polynomial of a given matrix. We will introduce in the next sections another 
polynomial (called the characteristic polynomial of the matrix) which can be 
directly computed (via the computation of a determinant) from the matrix and 
which is always a multiple of the minimal polynomial. This makes the compu- 
tation of the minimal polynomial much easier: one computes the characteristic 
polynomial P of the matrix, then looks at all possible monic divisors Q of P and 
checks which one kills the matrix and has the smallest degree. We will see later 
on that one does not really need to check all possible divisors, which makes the 
computation even more rapid. 


Cc 


wm 


Problem 8.10. Let T be a linear transformation on a finite dimensional F'-vector 
space V and let V = Vi ® V be a decomposition of V into subspaces which 
are stable under T. Let P, Pı, P be the minimal polynomials of T, T |y, and T|y, 
respectively. Prove that P is the least common multiple of P; and P2. 


Solution. Let Q be the least common multiple of Pı and P2. Since P kills T, it also 
kills T|y, and T|y,, thus it is a multiple of P; and P2. It follows that Q divides P. 
In order to prove that P divides Q, it suffices to prove that Q kills T. But since 
Q isa multiple of P, and P; kills T |y, it follows that Q kills T|y,. Similarly, Q 
kills T|y,. Since V = V, @ V2, we deduce that Q kills T and the result follows. C 
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A natural and rather subtle problem is the following: suppose that A € M,,(Q) is 
a matrix with rational entries. We can definitely see A as a matrix with real entries, 
i.e., as an element of M, (R) or as a matrix with complex entries, i.e., as an element 
of M,,(C). Thus we can attach to A three minimal polynomials! Fortunately, the 
following theorem shows that the three polynomials are actually one and the same: 
the minimal polynomial of a matrix does not depend on the field containing the 
entries of the matrix: 


Theorem 8.11. Let F, C F, be two fields and let A € M,(F\). Then the minimal 
polynomial of A seen as element of M,,(F;) and that of A seen as element of M, (F2) 
coincide. 


Proof. Let pı be the minimal polynomial of A € M,(F\) and u2 that of 
AéM, (Fr). Since F\[X] cC [X], the polynomial u, belongs to F2[X] and 
kills A, thus it must be a multiple of u2. In other words, u2 divides mı. Let 
d; = deg u;i. It suffices to prove that dz > dı and for this it suffices to prove that 
there is a nonzero polynomial P € F\[X] of degree at most d3 which vanishes at A 
(as such a polynomial is necessarily a multiple of u1). By hypothesis, we know that 
we have a relation 


aoln +aj;A+...+ag,A® = O, 


with a; € F, (namely the coefficients of u2). This is equivalent to n? linear 
homogeneous equations in the unknowns do,...,@g,. The coefficients of these 
equations are entries of the matrices [,,, A,..., A®, so they belong to F;. So we have 
a linear homogeneous system with coefficients in F} and having a nontrivial solution 
in Fy. Then it automatically has a nontrivial solution in F; (by Theorem 7.77), 
giving the desired polynomial P. Oo 


We end this section with a series of problems related to the pointwise minimal 
polynomial. Let V be a finite dimensional vector space over a field F and let 
T : V — V bea linear transformation with minimal polynomial wr. For x € V, 
consider 


I, = {P € F[X]|P(T)(x) = 0}. 
Note that the sum and difference of two elements of J, is still in Zy. 


Problem 8.12. Prove that there is a unique monic polynomial jz, € F[X] such that 
I, is the set of multiples of ux in F[X]. Moreover, ux divides ur. 


Solution. We may assume that x 4 0. Note that wr € Iy, since zr (T) = 0. Let 
Hx be the monic polynomial of smallest degree which belongs to Jy. We will prove 
that Iy = wy, F[X]. 

First, if P € uw, F[X], then P = uQ for some Q € F[X] and 


P(T)(x) = Q(T )(Ux(T)(x)) = Q(T)(0) = 0, 


thus P € [,. This shows that u, F[X] C Iy. 
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Conversely, let P € J, and, using the division algorithm let P = Q ux + R for 
some polynomials Q, R € F[X] such that deg R < deg ux. Assume that R Æ 0. 
Since P and Qj, belong to J, (the second one by the previous paragraph), we 
deduce that R € J,. Let a be the leading coefficient of R, then LR is a monic 
polynomial belonging to 7, and with degree less than that of ux, a contradiction. 
Thus R = 0 and ux divides P, finishing the solution. Oo 


Problem 8.13. Let T be a linear transformation on a finite dimensional vector 
space V over F, where F is an arbitrary field. 


a) Prove that if ur = PQ withk > 1, P € F[X] irreducible and Q relatively 
prime to P, then we can find x € V such that uy = P*. 
b) Prove that if x;,x2 € V are such that ux, and uy, are relatively prime, then 


Hxi+x2 = Mx Mx- 
c) Conclude that there is always a vector x € V such that ux = ur. 


Solution. a) Suppose on the contrary that uy Æ P* for all x € V. Let x € V. 
Then by hypothesis (P* Q)(T)(x) = 0. Hence v = Q(T)(x) lies in the kernel 
of P*(T) and so py divides P*. Since u, Æ P* and P is irreducible, p, divides 
P*—! and so P% (T) (v) = 0, that is (P*~!Q)(T)(x) = P*—!(T)(v) = 0. But 
since x was arbitrary, this means 7| P‘—! Q, a contradiction. 

b) Let Py = ux, and Pp = ux, and let P = PP). Since P is a multiple of both 
P, and P2, we deduce that P(T) vanishes at both x; and x, thus it vanishes at 
xı + X2 and so My, +x, | P. 

On the other hand, jx,+5,(T)(x1 + x2) = 0, thus 


(Pi bitin) (T) (1) + (Pi baytm)(T) (x2) = 0. 


The first term in the sum is 0, since P;(7)(x,) = 0, thus the second term must be 
0, which means that uy, = Pz | Piftx,45,. Since P; and P, are relatively prime, 
it follows that P, divides j1,,+4,, and by symmetry P} also divides y,+,,. Using 
again that P; and P, are relatively prime, we conclude that P = P, P divides 
4x, +x,- Combining this with the divisibility z,,4,, | P and using that ,,+,, 
and P are both monic, the result follows. 

c) Consider the decomposition ur = Pt Meg P of ur into irreducible polynomi- 
als. Here P},..., P, are pairwise relatively prime irreducible polynomials and 
ki are positive integers. By part a) we can find x; € V such that ux, = Př. 
Applying successively part b), we obtain 


k 
Hxi+... +x, = Hx Mx = Pi g P% = UT 


and so we can take x = xı +... + Xr. 
Oo 


Problem 8.14. Let V, be the span of x,7(x),7*(x),.... Prove that V, is a 
subspace of V of dimension deg ux, stable under T. 
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Solution. It is clear that V, is a subspace of V, stable under T. Let d = deg ux. 
We will prove that x, T(x),...,7¢~!(x) form a basis of V;,, which will yield the 
desired result. 

Suppose first that aox + aT (x) +... + aq-ı T! (x) = 0 for some scalars 
40, . . .,Ad—1, not all of them equal to 0. The polynomial P = aọ + a,;X +... + 
aq—1 X4! is then nonzero and belongs to J, (ie., P(T)(x) = 0). Thus P is a 
multiple of ux, which is impossible as it is nonzero and its degree is less than that 
of ux. Thus x, T(x), ..., T?! (x) are linearly independent over F. 

Let W be the span of x, T(x),..., 74~!(x). We claim that W is stable under T. 
It suffices to check that T? (x) belongs to W. But since ux(T)(x) = 0 and py is 
monic of degree d, we know that there are scalars bo, . . ., bg—1 such that 


T! (x) + baT! (x) +... + box = 0. 


This relation shows that T? (x) is a linear combination of x, T(x),..., T171 (x) and 
so it belongs to W. 

Now, since W is stable under T and contains x, it must contain all T(x) for 
k > 0, thus W must also contain V,. It follows that x, T(x),....7¢~!(x) is a 
generating subset of V, and the proof is finished, since we have already shown that 
this set is linearly independent. O 


8.2.1 Problems for Practice 


1. Compute the minimal polynomial of the following matrices: 


2 3 010 123 
aal 5]. A=]100}, A=|012 
001 001 


2. Compute the minimal polynomial of the matrix A € M,,(R) all of whose entries 
are equal to 1. 
3. Let A E€ M, (C). Prove that 
dim Span(J,, A, A*,...) = deg jug. 


4. Find a matrix A E€ M)(R) whose minimal polynomial is 


a) X= 3x +2. 
b) X?. 
c) X? 41. 


5. Let V be a finite dimensional vector space over F and let T : V —> V be an 
invertible linear transformation. Prove that T7! € F[T]. 


310 8 Polynomial Expressions of Linear Transformations and Matrices 


6. For which positive integers n can we find a matrix A € M,(R) whose minimal 
polynomial is X? + 1? 

7. Compute the minimal polynomial of the projection/symmetry of C” onto a 
subspace along a complementary subspace. 

8. Let T : M,(C) — C be the map sending a matrix to its transpose. Find the 
minimal polynomial of T. 

9. Let T : M,(C) — C be the map sending a matrix A = [a;;] to the matrix 
A= [aij], where Z is the complex conjugate of z. Find the minimal polynomial 
of T. 

10. Describe the minimal polynomial of a matrix A € M,,(C) of rank 1. 


8.3 Eigenvectors and Eigenvalues 


Let V be a vector space over a field F and let T be a linear transformation of V. 
In this section we will be interested in those A € F for which A - id — T is not 
invertible. The following definition is fundamental. 


Definition 8.15. An eigenvalue of T is a scalar A € F such that A - id — T 
is not invertible. An eigenvector of T corresponding to the eigenvalue A (or 
A-eigenvector) is any nonzero element of the space ker(A - id — T), which is called 
the eigenspace corresponding to A (or the A-eigenspace). 


Thus a A-eigenvector v is by definition nonzero and satisfies 
T(v) = Ay, 


and the A-eigenspace consists of the vector 0 and all A-eigenvectors. 
We have the analogous definition for matrices: 


Definition 8.16. Let A € M,,(F) be a square matrix. A scalar A € F is called an 
eigenvalue of A if there is a nonzero vector X € F” such that AX = AX. In this 
case, the subspace 


ker(Al, — A) := {X € F"| AX =1- X} 


is called the A-eigenspace of A. 


It is an easy but important exercise for the reader to check that the two 
definitions are compatible, in the following sense: let V be finite dimensional and 
let T : V — V be a linear transformation. Choose any basis of V and let A be the 
matrix of T with respect to this basis. Then the eigenvalues of T are exactly the 
eigenvalues of A. 
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0— 
1 0 
the eigenspaces of A, if we consider A as a matrix with complex entries. Let À be 
an eigenvalue and let X be a nonzero vector such that AX = AX. If xı, x2 are the 
coordinates of X, the condition AX = 1X is equivalent to the equations 


Example 8.17. Consider the matrix A = | . Let us find the eigenvalues and 


=x: = Àx, X1 = ÀxX2. 
We deduce that 
-—x2 = x. 


If x2 = 0, then x; = 0 and X = O, a contradiction. Thus x. 4 0 and necessarily 
A? = —1, that is A € {—i, i}. Conversely, i and —i are both eigenvalues, since we 
can choose x2 = | and x; = À as a solution of the previous system. Actually the 
A-eigenspace is given by 


ker(Aly — A) = {(A.x2, X2)|x2 E€ C} 


and it is the line spanned by v = (A, 1) € C?. Thus seen as a complex matrix, A has 
two eigenvalues +i, and the eigenspaces are the lines spanned by (i, 1) and (—i, 1). 

We see now A as a matrix with real entries and we ask the same question. 
Letting à € R be an eigenvalue and X an eigenvector as above, the same 
computations yield 


(A? + Ix. = 0. 


Since A is real, A? + 1 is nonzero and so x2 = 0, then xı = 0 and X = 0. The 
conclusion is that seen as a matrix with real entries, A has no eigenvalue, thus 
no eigenspace. This example shows that eigenvalues and eigenspaces are very 
sensitive to the field of scalars. 


Given a matrix A € M,,(F), how can we find its eigenvalues and its eigenspaces? 
The first part is much harder than the second one. Indeed, finding eigenspaces is 
equivalent to solving linear systems of the form AX = AX, which is not (too) 
difficult. On the other hand, finding eigenvalues comes down to solving polynomial 
equations, which is quite hard (but can be done approximately with the help of a 
computer as long as we are not interested in exact formulae). In practice (and for 
reasonably sized matrices) we use the following fundamental observation in order 
to compute eigenvalues: 


Proposition 8.18. A scalar À € F is an eigenvalue of A € M,,(F) if and only if 
det(AT, — A) = 0. 


Proof. AI,, — A is not invertible if and only if its determinant vanishes. The result 
follows. E 
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Let us come back to our problem of computing the eigenvalues of a matrix. If we 
know that 
411 412... Gin 
j= A21 422 ... A2n 
Anı An2 .-.- Ann 


where a;; € F fori, j = 1,2,...,n, then the proposition says that we can find the 
eigenvalues of A by solving the polynomial equation 


A-—ay =u ... Ain 
—a2| A aI -= Un 0 
—anl an2 «+. A— ann 


in F. This is a polynomial equation of degree n. If the degree is greater than 4, there 
is no general solution in terms of radicals (of course, there are instances in which 
one can solve the equations in terms of radicals, but most of the time this will not 
happen). 


10 0 
Example 8.19. Let us find the eigenvalues of A = | 00-1 |. We start by 
01 0 
simplifying the equation 
A-10 0 
0 à 1ļ|=0. 
0 -1A 


Expanding with respect to the first column and doing the computations, we obtain 
the equivalent equation 


(A —1)(A2 +1) =0. 


Next, we recall that eigenvalues are sensitive to the field of scalars. Since nothing 
was said about the field of scalars in this problem, we consider two cases. If we take 
the field of scalars to be C, then the eigenvalues are 1, +i, which are the complex 
solutions of the equation (A — 1)(A* + 1) = 0. If the field of scalars is R, then the 
only eigenvalue of A is 1. 


Remark 8.20. Let us mention two important and interesting consequences of 
Proposition 8.18 and the discussion following it: 


¢ For any matrix A € M,,(F), A and its transpose ‘ A have the same eigenvalues. 
Indeed, for A € F we have 


det(AI, — A) = det(‘(AI,, — A)) = det(AI, — A), 
thus det(A/,, — A) = 0 if and only if det(A/,, — * A) = 0. 
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¢ Any matrix A € M,,(F) has finitely many eigenvalues, since they are all 
solutions of a polynomial equation of degree n, namely det(AJ, — A) = 0. 
Actually, since a polynomial of degree n has at most n distinct roots, we deduce 
that any matrix A € M,,(F) has at most n eigenvalues. 


We can restate part of the previous remark in terms of linear transformations: 


Corollary 8.21. Let V be a finite dimensional vector space over F and let T : 
V > V bea linear transformation. Then T has only finitely many (actually at most 
dim V ) distinct eigenvalues. 


Remark 8.22. On the other hand, a linear transformation on an infinite dimensional 
vector space may very well have infinitely many eigenvalues. Consider for instance 
the space V of all smooth functions f : R > R, and consider the map T : V > V 
sending f to its derivative. Then f, : x b> e®* is an eigenvector with eigenvalue a 
for all a € R, thus any real number is an eigenvalue for T. 


The following important problem shows that it is very easy to describe the 
eigenvalues of an upper-triangular matrix: 


Problem 8.23. Let A = [a;;] be an upper-triangular matrix in M, (F). Prove that 
the eigenvalues of A are precisely its diagonal elements. 


Solution. By definition, A € F is an eigenvalue of A if and only if AJ, — A is 
not invertible. The matrix AJ,, — A is also upper-triangular, with diagonal elements 
À —a;;. But an upper-triangular matrix is invertible if and only if its diagonal entries 
are nonzero (because its determinant equals the product of the diagonal entries by 
Theorem 7.41). The result follows. 


Problem 8.24. Find the eigenvalues of A°, where 


1 357 
0436 
0 004 
0 002 


€ M,(R). 


Solution. It is useless to compute explicitly A®°: by the product rule for matrices, 
the product of two upper-triangular matrices A = [aj;;] and B = [b;;] is an 
upper-triangular matrix with diagonal entries a;;b;;. It follows that A® is an upper- 
triangular matrix with diagonal entries 1, 1/10°,0,64. By the previous problem, 
these are also the eigenvalues of A°. Oo 


The next important result says that eigenvectors corresponding to different 
eigenvalues are linearly independent. 


Theorem 8.25. Let A1,...,Ax be pairwise distinct eigenvalues of a linear transfor- 
mation T. Then the i; -eigenspaces of T are in direct sum position. 
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Proof. By definition, we need to prove that if T(v;) = A;vj and vı +... + vk = 0, 
then v; = ... = ve = 0. We will prove this by induction on k. The result is clear 
when k = 1, so assume that it holds for k — 1 and let us prove it for k. We have 


0 = Tvi +... +v) = Tv) +... + Tip) =A +... HARE, 
which combined with the relation A,vj +... + Avg = 0 yields 
0 = (Àk — à1)vi +... + AR — Àk-1)vk-1 = 0. 
The inductive hypothesis implies that (A, — A;)v; = O for 1 < i < k. Since 


Àk Æ Aj, this forces v; = 0 for 1 < i < k. But then automatically vg = 0, since 
vı +... + vk = 0. The inductive step being proved, the problem is solved. Oo 


Problem 8.26. Let À be an eigenvalue of a linear map T : V —> V, where V isa 
vector space over F and let P be a polynomial with coefficients in F. Prove that 
P(A) is an eigenvalue of P(T). 


Solution. The hypothesis yields the existence of a nonzero vector v € V such that 
T(v) = Av. By induction, we obtain T*(v) = A*v for k > 1. Indeed, if T*(v) = 
AFv, then 
Tay) = T(T*(v)) = TA‘) = TQ) = ARTY, 
We deduce that if P(X) = a X” +...+a,X + do, then 
P(T)(v) =a,T"(v) +... + a1T(v) + aov 


=a,A"v+...+dov = P(A)v 


and so P(A) is an eigenvalue of P(T). Oo 


The following consequence of the previous problem is very useful in practice: 


Problem 8.27. Let A € M,,(C) be a matrix and let P € C[X] be a polynomial 
such that P(A) = O,. Prove that any eigenvalue A of A satisfies P(A) = 0. 


Solution. By the previous problem, P(A) is an eigenvalue of P(A) = On. Since 0 
is the only eigenvalue of O,,, we deduce that P(A) = 0. E 


In particular, we obtain the following: 


Theorem 8.28. LetT : V — V bea linear transformation on a finite-dimensional 
vector space V over F. Then the eigenvalues of T are precisely the roots in F of 
the minimal polynomial ur of T. 


Proof. Since ur(T) = 0, the previous problem shows that all eigenvalues of T 
are roots of ur. Conversely, let A € F be a root of ur and assume that À is not 
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an eigenvalue of T. Thus T — Aid is invertible. Since ur (à) = 0, we can write 
ur(X) = (X —A) Q(X) for some Q € F[X]. Since ur(T) = 0, we deduce that 


(T — hid) 0 Q(T) = 0. 


As T — Aid is invertible, the last relation is equivalent to Q(T) = 0. Hence ur 
divides Q, which is absurd. The problem is solved. Oo 


The next problem is a classical result, which gives rather interesting bounds on 
the eigenvalues of a matrix. 


Problem 8.29 (Gershgorin Discs). Let A = [a;;] € M,(C) be a matrix and let 


Ri = » jai; |. 


1<j<n 


i#i 


a) Prove that if |a;;| > R; for alli, then A is invertible. 
b) Deduce that any eigenvalue of A belongs to the set 


n 


Ue € C]|z— ail < Ri}. 


i=l 
c) Give a geometric interpretation of the result established in part b). 


Solution. a) Suppose that A is not invertible, thus we can find a nonzero vector 


X e C”, with coordinates x1, x2,...,X,, such that AX = 0. Let i be an index 
such that 
|x;| = max |x;|. 
I<j<n 


The ith equation of the linear system AX = 0 reads 
di1X1 + ai2X2 +... + AinXn = 0, 


or equivalently 
iii = — ) ajjXj- 
j#i 


Using the triangular inequality, i.e., |z1 +... + Zn] < [zıl +... + [zn], valid for 
all complex numbers z1, . . ., Zn), we deduce that 


lanla] = | D5 ayx < >> layllxyl- 


j#i j#i 
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Since |x;| < |x;| for all j, we can further write 


laxi llxil < C layllxi| = Rill. 
J#i 
Note that x; 0, since otherwise |x;| < |x;| = 0 for all j, thus x; = 0 for 
all j, contradicting the fact that X Æ 0. Thus we can divide by |x;| the previous 
inequality and obtain 


jail < Ri, 


which contradicts the assumption of the problem. Hence A is invertible. 

b) Let A be an eigenvalue of A and let B = A —AI,. Write B = [b,;], with 
bij = aij when i Æ j and bj; = ai; — A. Since B is not invertible, part a) 
ensures the existence of an index i such that |b;;| < Die j [bij |. This can be also 
written as 


laii — À| < Ri 
and shows that 
à e (Jiz € Cll — ail < Ri}. 
i=1 


c) The set {z € C||z—a;;| < R;} is the closed disc centered at a;; and having radius 
R;. Thus part b) says that the eigenvalues of A are located in a union of discs 
centered at the diagonal entries of A and whose radii are Rj,...,Ry. Oo 


Remark 8.30. Consider 
Ci = > laji |. 
ji 


Applying the result established before to * A (which has the same eigenvalues as A) 
we obtain that the eigenvalues of A are also located in 


Ue € C||z — aii| < Ci}. 


i=l 


8.3.1 Problems for Practice 


1. Find the eigenvalues and the eigenvectors of the matrix 


110 
A=ļ|021 |€ M3(C). 
001 


8. 


w 


10. 


. Find all real numbers x for which the matrix A = | 
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. Let V be the set of matrices A € M (C) with the property that p is an 


eigenvector of A. Prove that V is vector subspace of M2(C) and give a basis 
for V. 


. Let e1, e2, e3, e4 be the standard basis of C4 and consider the set V of those 


matrices A € M4(C) with the property that e1, e2 are both eigenvectors of A. 
Prove that V is a vector subspace of M4(C) and compute its dimension. 
1 


. Find all matrices A € M3(C) for which the vector | 2 | is an eigenvector with 


3 
eigenvalue 2. 


. Find the eigenvalues of the matrix A € M,,(R) all of whose entries are equal 


to 2. 
1x 


5 "| € M)(R) has 


a) two distinct eigenvalues. 
b) no eigenvalue. 


. Let V be the space of all polynomials with real coefficients. Let T be the linear 


transformation on V sending P(X) to P(1 — X). Describe the eigenvalues of 
T. Hint: what is T o T? 


. A matrix A € M,(R) is called stochastic if a;; > 0 for alli, j € [1,n] and 


D= 4ij = 1 forall į € [1,n]. 


a) Prove that 1 is an eigenvalue of any stochastic matrix. 
b) Prove that any complex eigenvalue A of a stochastic matrix satisfies |A| < 1. 


. Consider the map T : R[X] > R[X] sending a polynomial P(X) to P(3X). 


a) Prove that T is a bijective linear transformation, thus its inverse TT! exists 
and is linear. 

b) Find the eigenvalues of T. 

c) Deduce that there is no polynomial P € R[X] such that 


TS P(T). 
Let A, B € M,,(C) be matrices such that 
AB — BA =B. 
a) Prove that AB‘ — B% A = kB* forall k > 1. 


b) Deduce that B is nilpotent. Hint: consider the eigenvalues of the map T : 
M, (C) > M, (C) given by T(X) = AX — XA. 
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11. 


12. 


13. 


14. 


15. 
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Let V be the space of continuous real-valued maps on [0, 1]. Define a map 
T:V—-V by 


1 
T(S)(x) =| min(x,t)f(t)dt 


for f eV. 


a) Justify that T is well defined and a linear transformation on V. 
b) Is V finite dimensional? 
c) Find the eigenvalues and describe the corresponding eigenspaces of T. 


Let V be the space of polynomials with real coefficients whose degree does not 
exceed n and let T : V — V be the map defined by 


T(P) = P(X) — (1 + X)P'(X). 


a) Explain why T is a linear transformation on V. 
b) Find the eigenvalues of T. 


Let V be the space of all sequences (x,),>1 of real numbers. Let T be the 
map which associates to a sequence (X„)n>1 the sequence whose general term 
is starttiin (forn > 1). 


a) Prove that T is a linear transformation on V. 
b) Find the eigenvalues and the corresponding eigenspaces of T. 


Let V be the vector space of polynomials with real coefficients and let 
T : V — V be the map sending a polynomial P to 


T(P) = (X? — 1)P"(X) + XP'(X). 


a) Prove that T is a linear map. 
b) What are the eigenvalues of T? 


a) Let A € M,(C) be a matrix with complex entries, let P € C[X] be a 
nonconstant polynomial and let u be an eigenvalue of P(A). Prove that there 
is an eigenvalue A of A such that P(A) = yp (this gives a converse of the 
result proved in Problem 8.26 for matrices with complex entries). Hint: factor 
the polynomial P(X)—j asc Th (X —z;) for some nonzero constant c and 
some complex numbers z1, . . ., Zd, and prove that at least one of the matrices 
A=—ziln,..., Á — Za Ín is not invertible. 


b) By considering the matrix A = i 7 | prove that the result established in 


part a) is false if we replace C with R. 
c) Suppose that a positive real number A is an eigenvalue of A”, where A € 
M,,(R) is a matrix. Prove that VA or — VJ is an eigenvalue of A. 
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16. Let A E€ M,,(R) be a matrix and let 


Express the eigenvalues of B in terms of those of A. 
17. Consider the matrix 


Prove that the eigenvalues of A are 4 sin? (x z) frl <j <n. 


8.4 The Characteristic Polynomial 


We saw in the previous section that finding the eigenvalues of a matrix A € M, (F) 
comes down to solving the polynomial equation 


det(AI, — A) = 0 


in F. In this section we will study in greater detail the polynomial giving rise to this 
equation. 

By construction, the determinant of a matrix is a polynomial expression with 
integer coefficients in the entries of that matrix. The following theorem refines this 
observation a little bit. 


Theorem 8.31. Consider two matrices A,B € M,(F). There is a polynomial 
P e F[X] such that for all x € F we have 


P(x) = det(xA + B). 
Denoting this polynomial P(X) = det(XA + B), we have 
det(XA + B) = det(A)X" + a@,;X" !+...+a,X + det B 


for some polynomial expressions 01, ...,@,—1 with integer coefficients in the entries 


of A and B. 
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Proof. Define P by 


P(X) = 5 elo )(aioa) X + bioa). ; «(anon X + bno(n)): 


o€Sy 


It is clear on the definition that P is a polynomial whose coefficients are polynomial 
expressions with integer coefficients in the entries of A and B. It is also clear that 
P(x) = det(xA + B) for x € F. The constant term is given by plugging in X = 0 
and thus equals det B. Moreover, for each o € S, we have 


£(0) (Gio) X + bioa). : (Anon) X + bao) = &(O)ajo(1)- : Anon) X” Serie Sy 


all terms but the first in the right-hand side having degree at most n — 1. Taking the 
sum over o, we see that P(X) starts with det A - X”, all other terms having degree 
at most n — 1. The result follows. E 


It follows from the theorem that if A, B have integer (respectively rational) 
entries, then det(XA + B) has integer (respectively rational) coefficients. 
Armed with the previous results, we introduce the following 


Definition 8.32. The characteristic polynomial of the matrix A € M,(F) is the 
polynomial y4 € F[X] defined by 


HA(X) = det(X - I, — A). 


Problem 8.33. Find the characteristic polynomial and the eigenvalues of the matrix 


0100 
20-10 

A= M,(R). 
0706) A 


003 0 


Solution. We compute using Laplace expansion with respect to the first row 


X -10 0 
—2 X 1 0 
Xxa(X) = det(XI; — A) = 0-7 X6 
0 0-3 X 
X 1 0 —2 1 0 
X|-7 X —6|+| 0 X —6| = 
0-3 X 0-3 X 


X(X? — 11X) = 2(X? — 18) = X* — 13X? + 36. 
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In order to find the eigenvalues of A, we need to find the real solutions of the 
equation 
xt — 13x? + 36 = 0. 


Letting y = x? 


we obtain the quadratic equation 
y? — 13y + 36 =0 


with solutions yı = 4 and y2 = 9. Solving the equations x? = 4 and x? = 9 yields 
the eigenvalues +2, +3 of A. E 


Problem 8.34. Find the characteristic polynomial and the eigenvalues of the matrix 


011 
A= 101 e M;(F). 
111 


Solution. We will constantly use that —1 = 1 in Fp. We obtain 


ya(X) = det(XJ; — A) = det(X h + A) = 


xX 1 1 1+% 0 1 
1x 1 =/14+xXxX41 1 
11x41 0 X X+l 


the equality being obtained by adding the second column to the first one and the 
third column to the second one. Now 


1+x 0 1 
1+XX+1 1 = 
0 X X+l 


1 0 1 
(X+1)/1X¥41 1 [=(X+D)X +1} =(X +17. 
0 X X+1 
Thus 
xa(X) = (X +1)? 
and consequently the unique eigenvalue of A is 1. Oo 


In the following more theoretical exercises, we will 


e compute the characteristic polynomial for a rather large class of matrices: upper- 
triangular, nilpotent, companion matrices, etc. 

e establish a few basic properties of the characteristic polynomial which turn out 
to be important in practice or in theoretical problems. 
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For upper-triangular matrices the characteristic polynomial can be read off 
directly from the diagonal entries: 


Problem 8.35. Let A = [aij] be an upper-triangular matrix (so that aj; = 0 
whenever i > j). Prove that 


xaX) = | [Q - ais). 


i=l 


Solution. The matrix XJ, — A is again upper-triangular, with diagonal entries equal 
to X — aii. The result follows directly from Theorem 7.41. 
oO 


Recall that * A is the transpose of the matrix A. 


Problem 8.36. Prove that A and’ A have the same characteristic polynomial when 
A € M,(F). 


Solution. Indeed ‘(XJ,, — A) = XI, — ‘A. Since a matrix and its transpose have 
the same determinant (Theorem 7.37), we have 


xa (X) = det(XT, — A) = det( (XT, — A)) = det(X In — 'A) = yr4(X), 


as desired. Oo 


Problem 8.37. Prove that the characteristic polynomial y 4 of A is of the form 
ya(X) = X" —Tr(A)X" 1 +... + (-1)" det A. 
Solution. Let us come back to the definition 


det(X ‘Ih - A) = > &(0)(X io) — A1o(1))- : A(X dno(n) 5 Ano(n))- 


o€S, 


A brutal expansion shows that 


(Xio) — Aio(1)) . A(X no(n) = Ano(n)) = X” I] dia(i)— 


i=l 


x”! Yd] Sko(k))@ jo(j) Funi 


j=l kj 


Note that [];_, Sisa) is nonzero only for the identity permutation, in which case 
it equals 1. This already shows that 7 4(X) is monic of degree n. It is clear that 
its constant term is y4(0) = det(—A) = (—1)” det A (all these results also follow 
straight from Theorem 8.31). 
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Next, if j € {1,2,...,m}, then Tee; dko(k) is nonzero only when o(k) = k 
for all k # j, but then automatically o(j) = j (as o is a permutation) and so o 
is the identity permutation. Thus the coefficient of X n=l ig nonzero in (Xica) — 
A1o(1))---(Xbno~) — ano(n)) if and only if o is the identity permutation, in which 
case this coefficient equals — Pj= aj; = —Tr(A). This shows that the coefficient 
of X"—! in y4(X) is —Tr(A). Oo 


Problem 8.38. Let A € M,,(F) be a nilpotent matrix. 


a) Prove that 
XA (X) = X” " 


b) Prove that Tr(A*) = 0 for all k > 1. 


Solution. a) Note that by definition there is a positive integer k such that A% = O,. 
Then 


X* I, = X* I, — A = (XI, = A(X! I, + XTA +... + AD). 
Taking determinants yields 
X"* = y4(X)-det(X* 1, +... + AR). 


The right-hand side is the product of two polynomials (again by the polynomial 
nature of the determinant). We deduce that 7 4(X) divides the monomial X”". 
Since moreover y4(X) is monic of degree n (by Problem 8.37), it follows that 
ya(X) = X". 

b) Replacing A with A* (which is also nilpotent), we may assume that k = 1. We 
need to prove that Tr(A) = 0. But by part a) y4(X) = X”, thus the coefficient 
of X"~! in y4(X) is 0. By the previous problem, this coefficient equals —Tr(A), 
thus Tr(A) = 0. Oo 


The following computation will play a fundamental role in the next section, 
which deals with the Cayley—Hamilton theorem. It also shows that any monic 
polynomial of degree n with coefficients in F is the characteristic polynomial of 
some matrix in M,,(F). 


Problem 8.39. Let do, a,,...,@,—1 E€ F and let 
00 0...0 ao 


100...0 a 
A=|010...0 
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Prove that 
Xa = X” ai S.6.— a0: 

Solution. Let P = X” — an-1X”7! —...—a,X — ao. Consider the matrix 

xX 0 0... 0 —do 

-1 X 0... 0 —a\ 

B=xXI,-A=]|0-1X... 0 —a2 

0 0 0...-1 X —a,-1 
Adding to the first row of B the second row multiplied by X, the third row multiplied 
by X?,..., the nth row multiplied by X"~! we obtain the matrix 


0 0 0... 0 P 

-1 X 0... 0 —4ı 
C=| 0-1X... 0 —a2 

0 0 0...-1X -a,-1 


We have y4 = det B = detC and, expanding det C with respect to the first row, 
we obtain 


-1 X... 0 


detC = (—1)"+! P - kag HO 


= (1 P(-1)"! = P, 
0 0...-1 

observing that the matrix whose determinant we need to evaluate is upper-triangular 

with diagonal entries —1. The result follows. 


Recall that two matrices A, B € M,,(F) are called similar if they represent the 
same linear transformation of F” in possibly different bases of this F-vector space. 
Equivalently, A and B are similar if there is P € GL,(F) such that B = PAP™!, 
i.e., they are conjugated by an invertible matrix. A fundamental property is that 
the characteristic polynomial is invariant under similarity of matrices. More 
precisely: 


Theorem 8.40. Two similar matrices have the same characteristic polynomial. 
Proof. Suppose that A and B are similar, thus we can find an invertible matrix 


P €M,,(F) such that B = PAP™!. Note that 


XI, —B = XPP— PAP“ = P(XI,, — A)P™. 
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Now, we will take for granted that the determinant is still defined and multiplicative 
for matrices with entries in F'[X] (recall that F[X] is the set of polynomials in 
one variable with coefficients in F). The existence is easy, since one can simply 
define in the usual way 


det A = 5 E(0)a1o(1)- - -Ano(n) 


o€S, 


for a matrix A = [a;;] with entries in F[X]. The fact that the determinant is 
multiplicative is trickier (the hardest case being the case when F is a finite field) 
and we will take it for granted. 

Consider then P, XI,,— A, XI, — B as matrices with entries in F [X]. The inverse 
of P in M,,(F) is also an inverse of P in M,,(F[X)]), thus P is invertible considered 
as a matrix in M, (F [X]). The multiplicative character of the determinant map yields 


xyB(X) = det(XJ, — B) = det(P) - det(X/,, — A) - det(P)! 


= det(X I, = A) = xA(X), 


as desired. Oo 


Problem 8.41. Prove that if A,B € M,(F), then AB and BA have the same 
characteristic polynomial. You may assume for simplicity that F = Ror F = C. 


Solution. If A is invertible, then AB and BA are similar, as 
AB = ABAA™ = A(BA)A™!. 


The previous theorem yields the result in this case. 

Suppose now that A is not invertible. As A has only finitely many eigenvalues 
(Corollary 8.21) and since F is infinite, there are infinitely many A € F such that 
A, := À - I, — A is invertible. By the first paragraph for all such A we have 


det(A} B) = det(BA,). 
This can be written as 


det(AB — AB) = det(AB — BA). 


Both sides are polynomials in A. Since they agree on infinitely many values of À, 
these polynomials are equal. In particular, they agree on A = 0, which is exactly the 
desired result. E 


Remark 8.42. The previous proof crucially uses the fact that F is infinite. The same 
result is true if F = F, (or more generally any field), but the proof requires more 
tools from algebra. 


The previous theorem shows that the following definition makes sense. 
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Definition 8.43. Let V be a finite dimensional F-vector space. The characteristic 
polynomial yr of the linear transformation T of V is the characteristic polynomial 
of the matrix of T in any basis. 


Problem 8.44. Let T : R? — R? be the linear transformation defined by 
T(x, X2, X3) = (xı — 2x2 + X3, X2 — X3, X1). 


Compute the characteristic polynomial of T. 


Solution. The matrix of T with respect to the canonical basis is 


1—2 1 
A= 0 1 -1 
10 0 
Thus 
Fst 2 -1 
xr(X)= x4a(X)=| 0 X-11)/= 
-1 0 X 
(X — 1) fe OE Veal. | a a 
0 xX} |x-11 


Oo 


Problem 8.45. Let T : V — V bea linear transformation on a finite dimensional 
vector space and let W be a subspace of V which is stable under T. Let 7; be the 
restriction of T to W. Prove that yy, divides yr. 


Solution. Choose a basis w,...,w, of W and complete it to a basis 
Wi, ..., Wk, Vk+1>--- Vn Of V. Since W is stable under T the matrix of T 


; 5 ; A x 
with respect to the basis w1,...,Wk,Vk+1,---,Vn is of the form Į Hl where 


A € M,(F) is the matrix of T, with respect to w,,...,w,. Using properties of 
block-determinants (more precisely Theorem 7.43) we obtain 


Xr(X) = xA(X)- xB(X) 


and the result follows. Oo 


The previous problem allows us to make the precise link between characteristic 
polynomial and eigenspaces: by construction the eigenvalues of a matrix can be 
recovered as the roots in F of the characteristic polynomial, but it is not clear 
how to deal with their possible multiplicities. Actually, there are two different (and 
important) notions of multiplicity: 
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Definition 8.46. Let T : V — V bea linear transformation on a finite dimensional 
vector space V over F and let A € F be an eigenvalue of T. 


a) The geometric multiplicity of À is the dimension of the F'-vector space Ker() - 
id—T). 

b) The algebraic multiplicity of 4 is the multiplicity of A as a root of the 
characteristic polynomial yr of T (i.e., the largest integer j such that (X — A)/ 
divides y7(X)). 


Of course, we have similar definitions for the multiplicities of an eigenvalue of a 
matrix: if A € M,(F) and À € F is an eigenvalue of A, the algebraic multiplicity 
of À is the multiplicity of À as a root of 74, while the geometric multiplicity of A is 
dim Ker(A J,, — A). A good exercise for the reader is to convince himself that if A 
is the matrix of a linear transformation T with respect to any basis of V, then the 
corresponding multiplicities of A for A and for T are the same. 


Remark 8.47. The algebraic multiplicity and the geometric multiplicity are not 
always equal: consider the matrix A = sal It has 0 as an eigenvalue with 
geometric multiplicity 1: indeed the system AX = 0 is equivalent to x) = 0, thus 
Ker(A) is the line spanned by the first vector of the canonical basis of F°. On the 
other hand, the characteristic polynomial of A is y4(X) = X?, thus the algebraic 
multiplicity of 0 is 2. If the algebraic multiplicity of an eigenvalue A coincides 
with its geometric multiplicity, we will simply refer to this common value as the 


multiplicity of À. 


Problem 8.48. Consider the matrix 


8 -1—5 
A=|-2 3 1 | EMR). 
7 aoe eee 


a) Find the characteristic polynomial and the eigenvalues of A. 
b) For each eigenvalue A of A, find the algebraic and the geometric multiplicity 
of À. 


Solution. a) Adding the second and third column to the first one yields 


=e. ï 5 pene 
xa(X)=| 2 X-3 ab |S Kee KS3: -1 
-4 1 X+ıi| |X-2 


1 1 5 
=(X-2)|1X-3 -I 
1 1 X+1 
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To compute the last determinant, subtract the first row from the second and the 
third row, then expand with respect to the first column. We obtain in the end 


xa(X) = (X —2)(X — 4)’. 


The eigenvalues of A are the real roots of y 4, thus they are 2 and 4. 

Since y4(X) = (X — 2)(X — 4), it follows that 2 has algebraic multiplicity 
1 and 4 has algebraic multiplicity 2. To find the geometric multiplicity of 2, we 
determine the 2-eigenspace by solving the system AX = 2X. The reader will 
check without difficulty that the system is equivalent to x = y = z (where 
x,y,z are the coordinates of X), thus the 2-eigenspace is one-dimensional and 
the geometric multiplicity of the eigenvalue 2 is 1 (we could have done this 
without any computation if we knew the theorem below). For the eigenvalue 
4, we proceed similarly by solving the system AX = 4X. An easy computation 


b 


wm 


shows that the system is equivalent to y = —x and z = x, thus the 4-eigenspace 
is also one-dimensional and so the geometric multiplicity of the eigenvalue 4 is 
also 1. E 


As we have already seen, algebraic multiplicity and geometric multiplicity are 
not the same thing. The next result gives however precious information concerning 
the link between the two notions. 


Theorem 8.49. Let A € M,(F) and let à € F be an eigenvalue of A. Then the 
geometric multiplicity of À does not exceed its algebraic multiplicity. In particular, 
if the algebraic multiplicity of À is 1, then its geometric multiplicity equals 1. 


Proof. Let V = F” and let T be the linear map on V attached to A. Let W = 
ker(AI, — A) = ker(Aid—T). Then W is stable under T, thus by Problem 8.45 (and 
letting T |w be the restriction of T to W) yr, divides yr. On the other hand, T |w 
is simply multiplication by À on W, thus 


XTi (X) = (X — Am, 


It follows that (X — 4)#®W divides y4 (X) = yr(X) and the result follows. O 
The result established in the next problem is very important in applications: 


Problem 8.50. Let A € M, (C) be a matrix with complex entries. Let Sp(A) be the 
set of eigenvalues of A (we call Sp(A) the spectrum of A) and, for A € Sp(A), let 
my, be the algebraic multiplicity of A. 


a) Explain the equality of polynomials 


xa(X)= || &-ay™. 


AeSp(A) 
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b) Prove that 


Tr(A) = > ma. 


dESp(A) 


In other words, the trace of a complex matrix is the sum of its eigenvalues, 
counted with their algebraic multiplicities. 
c) Prove that 


det A = I] Ae 
AeSp(A) 


that is the determinant of a matrix is the product of its eigenvalues, counted 
with their algebraic multiplicities. E 


Solution. a) It is clear by definition of algebraic multiplicities that Į [ iesp X — 
A)y™ divides y4(X) (this holds for a matrix with coefficients in any field). 
To prove the opposite divisibility (which allows us to conclude since both 
polynomials are monic), we will crucially exploit the fact that the matrix has 
complex entries and that C is algebraically closed. In particular, we know that y 4 
splits in C[X] into a product of linear factors X — z. Any such z is an eigenvalue 
of A, since det(z/,, — A) = 0. Hence z € Sp(A) and by definition its multiplicity 
as root of 74(X) is m. The result follows. 

b) The coefficient of X"~' in J [iesp (X — A)” is — Dz esp¢ay maA. On the other 
hand, the coefficient of X"~! in y 4 equals —Tr(A) by Problem 8.37. The result 
follows from a). 

c) Taking X = 0 in the equality established in a) and using the fact that 7 4(0) = 
(—1)" det A and that Vresp(a) m, = n, we obtain 


jt) det A = y,4(0) = I] (-ay™ = (1) I] qm. 


A€Sp(A) AeSp(A) 


The result follows by dividing by (—1)”. E 


Remark 8.51. If we replace C with R or Q the result is completely false: it may even 


happen that Sp(A) is empty! Indeed, consider for instance the matrix A = i Pal 


Here is a nice application of the previous problem. 


Problem 8.52. a) Let A € M,,(R) be a matrix such that 
A*—3A +21, = 0. 


Prove that det A € {1,2,4,....,2"}. 
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b) Let k € {1,2,4,...,2”}. Construct a matrix A € M,,(R) such that A? — 3A + 
21, = 0 and det A =k. 


Solution. a) By Problem 8.27 for any complex eigenvalue A of A we have 
2?—31 +2 = 0, that is (A—1)(A—2) = 0. It follows that each complex eigenvalue 
of A is either 1 or 2. Since det A is the product of all complex eigenvalues of A 
(counted with their algebraic multiplicities), the result follows. 

b) Write k = 2? with p € {0,1,...,m}. Then a diagonal matrix A having p 
diagonal entries equal to 2 and the other diagonal entries equal to 1 is a solution 
of the problem. Oo 


8.4.1 Problems for Practice 


1. Find the characteristic polynomial and the eigenvalues of the matrix 


2. Find the characteristic polynomial and the eigenvalues of the matrix 


1100 
0101 
A=] iorn © Mi): 


0011 


3. a) Give an example of a matrix A € M,4(R) whose characteristic polynomial 
equals X4 — X? + 1. 
b) Is there a matrix A € M3(Q) whose characteristic polynomial equals X? — 
./2? Give an example of such a matrix in M3(R). 
4. For each of the matrices below, compute its characteristic and minimal polyno- 


mial 
a) 
j= —1 —3 
2 1 
b) 
100 
A=1|]020 
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8. 


9. 


Then find their eigenvalues and the corresponding eigenspaces, by considering 
these matrices as matrices with rational entries. Then do the same by consider- 
ing these matrices as matrices with real (and finally with complex) entries. 


. Letn > 2 and let 


N Re 
me N 
N N 
N N 
N N 


A=|2 2 1...22|€M@R). 


a) Compute the minimal polynomial and the characteristic polynomial of A. 
b) Describe the eigenvalues of A and the corresponding eigenspaces. 


. a) Let A € M,(R) be the matrix associated with the projection of R” 


onto a subspace W along a complementary subspace of W. Compute the 
characteristic polynomial of A in terms of n and dim W. 

b) Answer the same question assuming that A is the matrix associated with the 
symmetry with respect to a subspace W along a complementary subspace 
of W. 


. Consider the following three 5 x 5 nilpotent matrices 


01000 01000 01000 
00100 00100 00100 
A=]00010], B=/00000/], C=/00000 
00000 00000 00001 
00000 00000 00000 


Since these matrices are nilpotent they all have characteristic polynomial 
xA(X) = XB(X) = xc (X) = X°. 


a) Compute the minimal polynomials of these matrices and use them to show 
that A is not similar to either B or C. 

b) Compute the dimensions of the kernels of these matrices and use them to 
show that B is not similar to A or C. 


Let A € M,(R) be a matrix such that A? + J, = 0. Prove that Tr(A) is an 
integer. 
Prove that any matrix A € M,,(R) is the sum of two invertible matrices. 


10. Let A € M,,(C) be an invertible matrix. Prove that for all x 4 0 we have 


x” 
Xa (x) = na 
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11. Let A = [aij] € M,(C) and let A = [a] be the matrix whose entries are the 
complex-conjugates of the entries of A. Prove that the characteristic polynomial 
of AA has real coefficients. Hint: use the fact that AA and AA have the same 
characteristic polynomial. 

12. Let A € Mn,p(C) and B € M,,,(C). 


a) Prove the following identities for x € C 


xin A] [In Onp| _[xIn-AB A 
B I | LSB L] L Om L 


In On,p i xT, A| | xt, A 
—B xI, B Ip)  LOpn xIp — BA] 


and 


b) Deduce that 
X1xaB(X) = X” XB4(%). 
13. Let A and B be matrices in M3 (C). Show that 
1 
det(AB — BA) = z (4B — BA)?). 


Hint: if a, b, c are the eigenvalues of AB — BA, prove that a + b + c = 0 and 
then that 


a? + b? + e =3abe. 
14. Prove that for all A, B € M, (C) 
deg(det(XA + B)) < rank(A). 


Hint: if r is the rank of A, start by reducing the problem to the case A = 


15. Let A, B, C and D be square matrices in M,,(C) and let 


AB 
male p] € a0; 


a) Assume that DC = CD and that D is invertible. Check the identity 


AB] [D 0O,|_[AD-BC B 
CD] -CLI On, D 
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16. 


17. 


18. 


19. 


and deduce that 
det M = det(AD — BC). 
b) Assume that DC = CD, but not necessarily that D is invertible. Prove that 
det M = det(AD — BC). 


Hint: consider the matrix D, = xI, + D with x € C. 
c) By considering the matrices 


q= 10 B= 10 C= ROI, E 
11 01 00 00 
prove that the result in part b) no longer holds if we drop the hypothesis 
CD = DC. 


a) Find two matrices A, B € M,(R) with the same characteristic and minimal 
polynomial, but which are not similar. 

b) Can we find two such matrices in M3 (R)? 

Let A = [a;;] € M, (C) and let są be the sum of all k x k principal minors of A 

(thus sı is the sum of the diagonal entries of A, that is Tr(A), while s, is det A). 

Prove that 


xa(X) = X” -sı X! +X"... t HD" Sn. 


Hint: use the multilinear character of the determinant map. 
Let V = M, (R) and consider the linear transformation T : V > V defined by 


T(A) = —A + Tr(A) - I. 


a) Prove that V is the direct sum of the eigenspaces of T. 
b) Compute the characteristic polynomial of T. 


Let V = M,,(R) and consider the linear transformation T : V > V sending A 
to ‘A. Find the characteristic polynomial of T. Hint: what is T o T? 


8.5 The Cayley—Hamilton Theorem 


We now reach a truly beautiful result: any matrix is killed by its characteristic 
polynomial. Recall that y4 denotes the characteristic polynomial of A € M, (F). 
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Theorem 8.53 (Cayley—Hamilton). For all matrices A E€ M,(F) we have 
xa(A) = On. 
In other words, if y 4(X) = X” + a,-1X""! +... + ao, then 
A” + an1 A"! +... + aA + aol, = On. 


There are quite a few (at least 30...) different proofs of this result, nei- 
ther of them being straightforward. The reader should therefore start by finding 
the error in the following classical, but unfortunately wrong argument: since 
xA(X) = det(X I, — A), we have 


xa(A) = det(AI, — A) = det(A — A) = det(O,,) = 0. 
Before moving to the rather technical proofs of the previous theorem, we take a 


break and focus on some applications: 


Problem 8.54. Let A € M,,(F). Prove that the minimal polynomial of A divides 
the characteristic polynomial of A. 


Solution. Since y4 annihilates A by the Cayley—-Hamilton theorem, it follows that 
[La divides y 4. 
oO 


Problem 8.55. Let A € M,,(F) be an invertible matrix. Prove that there are scalars 
do, -..,an—1 E F such that 


AT! = aol, +a A +... + an1 4"7!. 


Solution. The characteristic polynomial of A is of the form X” + bn-1 X e ae 
bi X + bo, with bọ = (—1)” det A nonzero. By the Cayley-Hamilton theorem 


A” + b14"! +... +biA + bol, = On. 


Multiplying by pA we obtain 


1 bn—1 bi 
e E E E gee A EO A 
bo bo bo 
Thus we can take 
by by 1 
dg=—-—, a =——,..., Ane ——. 
0 Bg Bo by 
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Problem 8.56. Let A € M,(C) be a matrix. Prove that the following statements 
are equivalent: 


a) A is nilpotent (recall that this means that AX = O, for some k > 1). 
b) The characteristic polynomial of A is X”. 

c) A” = On. 

d) The minimal polynomial of A is of the form X* for some k > 1. 


Solution. The fact that a) implies b) follows directly from part a) of Problem 8.38. 
That b) implies c) is a direct consequence of the Cayley—Hamilton theorem. If c) 
holds, then X” kills A, thus the minimal polynomial of A divides X” and is monic, 
thus necessarily of the form X* for some k > 1, proving that c) implies d). Finally, 
since the monic polynomial of A kills A, it is clear that d) implies a). Oo 


We will give two proofs of the Cayley—Hamilton theorem in this section. Neither 
of them really explains clearly what is happening (the second one does a much better 
job than the first proof from this point of view), but with the technology we have 
developed so far, we cannot do any better. We will see later on a much better proof,! 
which reduces (via a subtle but very useful density argument) the theorem to the 
case of diagonal matrices (which is immediate). 

Let us give now the first proof of the Cayley—Hamilton theorem. Let A € M, (F) 
and let B = XI,,—A € M,(K), where K = F(X) is the field of rational fractions” 
in the variable X, with coefficients in F. Consider the adjugate matrix C = adj(B) 
of B. Its entries are given by determinants of (n — 1) x (n — 1)-matrices whose 
entries are polynomials of degree < 1 in X. Thus each entry of C is a polynomial 
of degree at most n — 1 in X, with coefficients in F. Let 


0 1 -1 a 
Cij = Ge +X + bel ) yn 1 


be the (i, j)-entry of C, with c®,..., i 


ij? € F. Let C“ be the matrix whose 


entries are the i . Then 
CSO PON CO a 
Next, recall that 
B.C = B -adj(B) = det B- In = xa(X)- In. 


Thus we have 


(XI, — A) (CO + COX +... + C0 VX") = y4(X)- Th. 


'Which unfortunately works only when F C C, even though one can actually deduce the theorem 
from this case. 


2 An element of K is a quotient 4, where A, B € F[X] and B £0. 
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Writing y4(X) = X” +u,—1X"7!+...+uo € F[X], the previous equality becomes 
—AC® + (C® =AC NX + (CC® — AC®)X +... OCF eo ix 
+CD X" = uol, +u In X +... + ünn X" | + pX”. 

Identifying coefficients yields 
-AC® = mh, CO- ACO =mhh,..., 
C8? ACOD Sp th CY ST: 


Dealing with these relations by starting with the last one and working backwards 
yields 


COD =y, COP S A+ sil, CO? = A? + üyi AF unaIn 
and an easy induction gives 
Ce IY) = AÏ + uy AI +... + tnj In. 
In particular 
CO = A! + maA? +... H n. 
Combining this with the relation —AC® = uo/, finally yields 
A" + un A"! +... + uola = On, 


that is y4 (4) = On. 

As the reader can easily observe, though rather long, the proof is fairly elemen- 
tary and based on very simple manipulations. It is not very satisfactory however, 
since it does not really show why the theorem holds. 

We turn now to the second proof of the Cayley-Hamilton theorem. We will 
actually prove the following result, which is clearly equivalent (via the choice of 
a basis) to the Cayley—Hamilton theorem. 


Theorem 8.57. Let V be a finite dimensional vector space over F and let 
T : V —> V bea linear map. Then x7r(T) = 0. 


Proof. The idea is to reduce the problem to linear maps for which we can compute 
easily yr. The details are a little bit more complicated than this might suggest. . . 
Fix an x € V. If m is a nonnegative integer, let 


Wn = Span(T°(x), T!(x),...,T7(x)). 
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Note that Wọ C Wi Cc... C V and that dim W,, < dim Wmn+ı < dim V for all 
m > 0. Hence there must be some least m such that dim W,,_; = dim Wn. Since 
Wrn—1 = Wn, we must have W,,-; = Wn, in other words T” (x) lies in the subspace 


W,mn-1 and we can write T” (x) as a linear combination of Tk (x) for0 < k < m, say 


m—l1 


T” (œ) = $ are). 
k=0 


Note that this implies W,,-; is stable under T. Since m is minimal, the vectors 
T°(x),...,7 !(x) must be linearly independent (a linear dependence among 
them would express a lower iterate as a linear combination of earlier iterates). 
Therefore they are a basis of W,,-; and with respect to this basis the matrix of 
Tı = T |w, is 


m—1 


000---0 ao 
100-0 a 
A=1010--0 a 


000-:: I am- 


The characteristic polynomial of this matrix was computed in Problem 8.39 and it 
equals X” — am-1 X”! —---— ag. Hence 
m-1 
an (T)(x) = TO- Do ag T*(x) = 0. 
k=0 


By Problem 8.45, since W,,_; is T-stable, the characteristic polynomial 77, of T 
restricted to W,,-; divides yr. Therefore y7(7)(x) = 0. Since x was arbitrary, we 
conclude that yr (T) vanishes when applied to any vector, that is, it is the zero linear 
map. Oo 


8.5.1 Problems for Practice 


1. Prove that for any A = [a;;] € M3(C) we have 
A? —Tr(A) - A? + Tr(adjA)- A — det A - I; = 0. 
2. Let A € M3(R) be a matrix such that 
Tr(A) = Tr(A°) = 0. 


Prove that A? = aJ; for some real number a. 
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3. Let A, B € M3(C) be matrices such that the traces of AB and (AB)? are both 0. 
Prove that (AB)? = (BA)?. 
4. Let A, B,C € M,(C) be matrices such that AC = CB and C Æ O,. 


a) Prove that for all polynomials P € C[T] we have 
P(A)C =CP(B). 


b) By choosing a suitable polynomial P and using the Cayley—Hamilton 
theorem, deduce that A and B have a common eigenvalue. 


5. Let A, B € M,(C) be matrices such that (AB)” = O,. Prove that (BA)” = O,. 
Hint: prove first that (BA)"*! = O,, then use Problem 8.56. 

6. Let A E€ M,(C) be a matrix such that A and 3A are similar. Prove that A” = 
On. Hint: similar matrices have the same characteristic polynomial. Also use 
Problem 8.56. 

7. Let A € M,(C). Prove that A” = O, if and only if Tr(A*) = 0 for all k > 1. 
Hint: to establish the harder direction, prove that all eigenvalues of A must be 0 
and use Problem 8.56. 

8. Let V be a vector space of dimension n over a field F and let T : V > V be 
a linear transformation. The goal of this problem is to prove that the following 
assertions are equivalent: 


i) There exists a vector x € V such that x, T(x),..., T”! (x) forms a basis 
of V. 
ii) The minimal polynomial and the characteristic polynomial of T coincide. 


a) Assume that i) holds. Use Problem 8.14 to prove that deg ur > n and 
conclude that ii) holds using the Cayley—Hamilton theorem. 

b) Assume that ii) holds. Using Problems 8.13 and 8.14, explain why we can 
find x € V such that x, T(x), T? (x), ... Span V. Conclude that i) holds. 


9. Letn > 1 and let A,B € M,(Z) be matrices with integer entries. Suppose 
that det A and det B are relatively prime. Prove that we can find matrices 
U,V € M,,(Z) such that AU + BV = H. 


Chapter 9 
Diagonalizability 


Abstract The main focus is on diagonalizable matrices, that is matrices similar to 
a diagonal one. We completely characterize these matrices and use this to complete 
the proof of Jordan’s classification theorem for arbitrary matrices with complex 
entries. Along the way, we prove that diagonalizable matrices with complex entries 
are dense and use this to give a clean proof of the Cayley—Hamilton theorem. 


Keywords Diagonalizable œ Trigonalizable * Jordan block ¢ Jordan’s 
classification 


In this chapter we will apply the results obtained in the previous chapter to study 
matrices which are as close as possible to diagonal ones. The diagonal matrices are 
fairly easy to understand and so are matrices similar to diagonal matrices. These 
are called diagonalizable matrices and play a fundamental role in linear algebra. 
For instance, we will prove that diagonalizable matrices form a dense subset of 
M,,(C) (i.e., any matrix in M,,(C) can be approximated to arbitrary precision with 
a diagonalizable matrix) and we will use this result to give a very simple proof 
of the Cayley—Hamilton theorem over C, by reducing it to the case of diagonal 
matrices (which is trivial). Also, we will prove that any matrix A € M,(C) is 
the commuting sum of a nilpotent and of a diagonalizable matrix, showing once 
more the importance of diagonalizable (and nilpotent) matrices. We then use the 
classification of nilpotent matrices obtained in the chapter concerned with duality to 
prove the general form of Jordan’s theorem, classifying all matrices in M, (C) up to 
similarity. Along the way, we give applications to the resolution of linear differential 
equations (of any order) with constant coefficients, as well as to linear recurrence 
sequences. 

A large part of the chapter is devoted to finding intrinsic properties and 
characterizations of diagonalizable matrices. In this chapter F will be a field, but 
the reader will not loose anything by assuming that F is either R or C. 
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9.1 Upper-Triangular Matrices, Once Again 


Recall that a matrix A = [a;;] € M,(/) is called upper-triangular if aj; = 0 
whenever i > /, that is all entries of A below the main diagonal are zero. We have 
already established quite a few results about upper-triangular matrices, which make 
this class of matrices rather easy to understand. For instance, we have already seen 
that the upper-triangular matrices form a vector subspace of M, (F) which is closed 
under multiplication. Moreover, it is easy to compute the eigenvalues of an upper- 
triangular matrix: simply look at the diagonal entries! It is therefore easy to compute 
the characteristic polynomial of such a matrix: if A = [a;;] is an upper-triangular 
matrix, then its characteristic polynomial 


xa(X) = [[~ — ii). 


i=l 


Before dealing with diagonalizable matrices, we will focus on the trigonalizable 
ones, i.e., matrices A € M, (F) which are similar to an upper-triangular matrix. We 
will need an important definition: 


Definition 9.1. A polynomial P € F[X] is split over F if it is of the form 
P(X) =c(X —aı)...(X — an) 


for some scalars c,a1,...,a, E F (not necessarily distinct). 


For instance, X? + 1 is not split over R since it has no real root, but it is split over 
C, since X?+1 = (X +i)(X —i). On the other hand, the polynomial X?—3X +2 is 
split over R, since it factors as (X —1)(X —2). It is pointless to look for a polynomial 
in C[X] which is not split, due to the following amazing theorem of Gauss: 


Theorem 9.2 (The Fundamental Theorem of Algebra). Any polynomial 
P e C[X] is split over C. 


This theorem is usually stated as: C is an algebraically closed field, that is any 
nonconstant polynomial equation with complex coefficients has at least one complex 
solution. The previous theorem is actually equivalent to this usual version of Gauss’ 
theorem (and it is a good exercise for the reader to prove the equivalence of these 
two statements). 

By the previous discussion, the characteristic polynomial of an upper-triangular 
matrix is split over F. Since the characteristic polynomials of two similar matrices 
are equal, we deduce that the characteristic polynomial of any trigonalizable matrix 
is split over F. 


Problem 9.3. Give an example of a matrix A € M2 (R) which is not trigonalizable 
in M)(R). 
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Solution. Since the characteristic polynomial of a trigonalizable matrix is split 
over R, it suffices to find a matrix A € M>(R) whose characteristic polynomial 
is not split over R. Consider the matrix 


01 
A= : 
[Ao 
Its characteristic polynomial is X? + 1, which is not split over R. Thus A is not 
trigonalizable in M>(R). o 


The following fundamental theorem gives an intrinsic characterization of trigo- 
nalizable matrices. 


Theorem 9.4. Let A € M, (F) be a matrix. Then the following assertions are 
equivalent: 


a) The characteristic polynomial of A is split over F. 
b) A is similar to an upper-triangular matrix in M, (F). 


Proof. The discussion preceding the theorem shows that b) implies a). We will 
prove the converse by induction on n. It is clearly true for n = 1, so assume that 
n > 2 and that the statement holds for n — 1. 

Choose a root A € F of the characteristic polynomial y4 of A (we can do it, 
thanks to the hypothesis that y 4 is split over F), and choose a nonzero vector v € F” 
such that Av = Av. Since v Æ 0, we can complete vı to a basis v1,...,v, of 
V = F”. The matrix of the linear transformation T attached to A with respect to 
the basis v1, ..., V» is of the form 

À x 
[oa] 


for some B € M,_\(F). Thus we can find an invertible matrix P, such that 


parei] 


for some B € Mp-ı(F). Since similar matrices have the same characteristic 
polynomial, we obtain 


XA(X) = Xp apr (X) = (X — à)xs (X), 


the last equality being a consequence of Theorem 7.43. It follows that 7g is split 
over F. Since B € M,_\(F), we can apply the inductive hypothesis and find 
an invertible matrix Q € M,_,(F) such that QBQ™' is upper-triangular. Let 
P, = E A then P> € M,,(F) is invertible (again by Theorem 7.43 we have 


det P) = det Q Æ 0) and 
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À * 
P (P, APP) Py! = 
PAPDPE =) opo | 
is upper-triangular. Setting P = P Pj, the matrix PAP™! is upper-triangular, as 
desired. o 


Combining the previous two theorems, we obtain the following very important 
result: 


Corollary 9.5. For any matrix A € M,(C) we can find an invertible matrix 
P € M,(C) and an upper-triangular matrix T € M,(C) such that A = PTP™!. 
Thus any matrix A € M, (C) is trigonalizable in M, (C). 


Proof. By Gauss’ theorem the characteristic polynomial y4 of A is split over C. 
The result follows from Theorem 9.4. o 


As a beautiful application of Corollary 9.5, let us give yet another proof of the 
Cayley-Hamilton theorem for matrices in M, (C) (the result applies of course to 
matrices in M, (Q) or M, (R)). Recall that this theorem says that y4 (4) = O, for 
any matrix A € M, (C), where y4 is the characteristic polynomial of A. We will 
prove this in two steps: first, we reduce to the case when A is upper-triangular, then 
we prove the theorem in this case by a straightforward argument. 

Let A € M, (C) be a matrix and let P be an invertible matrix such that the matrix 
T = PAP™! is upper-triangular. We want to prove that y4(4) = Op, but 


(44) = xa(P'TP) = Py P= Pe TP; 


the last equality being a consequence of the fact that A and T are similar, thus have 
the same characteristic polynomial. Hence it suffices to prove that yr(T) = On, in 
other words, we may and will assume that A is upper-triangular. 

Let e1, ..., en be the canonical basis of C” and consider the polynomials 


k 
Ox(X) = | [X - aii), 


i=l 


so that Q, = ya (since A is upper-triangular). We claim that Qg (A)e; = 0 for 
1 <i <k and for all 1 < k < n. Accepting this for a moment, it follows that 
Qn(A)e; = 0 forall 1 < i < n, that is y4(A)e; = 0 for all 1 < i < n, which is 
exactly saying that y4 (4) = On. 

It remains to prove the claim, and we will do this by induction on k. If k = 1, we 
need to check that Qı (A)e; = 0, that is (A — a11 Zn)e1 = 0, or equivalently that the 
first column of A — a11 In is zero, which is clear since A is upper-triangular. Assume 
now that O;,(A)e; = 0 for 1 < i < k, and let us prove that Ox4;(A)e; = 0 for 
1<i<k+1.If1 <i <k, then O;(A)e; = 0 yields 


Ox+i(A)e; = (A — ak+1k+1 In) Ox(A)e; = 0. 
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Ifi =k +1, then 


Ox+i (Aer = Ox(A)(A — ak+1.k+1In)ei = 
k 
Ox(A)(Ae; — ak+ıx+1€i) = — È ainsi Ox(Ae; = 0, 
i=1 
since Q;,(A)e; = 0 for 1 <i < k. The inductive step is established and the claim 
is proved. 


Problem 9.6. Let A € M,(C) and let Q € C[X] be a polynomial. If the 
characteristic polynomial of A equals []/_,(X — A;), prove that the characteristic 
polynomial of Q(A) equals Į [/_,(X — Q(;)). 


Solution. By the previous corollary we can write A = PTP! for some 
P €GL,(C) and some upper-triangular matrix T. The characteristic polynomial 
of T is the same as that of A, and it is also equal to []/_,(X — tD if T = [t;]. 
Thus the diagonal entries of T are A,,...,A» (up to a permutation). Next, Q(A) = 
POQO(T)P™ and the characteristic polynomial of Ọ (A) is the same as that of Q(T). 
But Q(T) is again upper-triangular, with diagonal entries Q(A;),..., O (àn), so 


Xo = xam = | [X - 20D). 


i=] 
oO 


Problem 9.7. Let A € M,(C) have eigenvalues 4,,...,4, (counted with their 
algebraic multiplicities). Prove that for all Q € C[X] we have 


det Q(A) = [[ 2A), TQ(A)) = X 20). 


i=l i=1 


Solution. Simply combine the previous problem with Problem 8.50. o 


9.1.1 Problems for Practice 


1. For each of the following matrices decide whether A is trigonalizable over R or 
not: 
121 
a) A= |322]. 
011 


vy a=|)5] 
25 
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x 


2. Find all real numbers x for which the matrix A = 
x—12+x 


able in M2 (R). 
3. Find an upper-triangular matrix which is similar to the matrix 


[3] 


4. Find an upper-triangular matrix which is similar to the matrix 


| is trigonaliz- 


100 
210 
321 


5. A matrix A € M3(C) has eigenvalues 1,2, —1. Find the trace and the determinant 
of Æ +2A + h. 

6. Let A € M, (F) be a matrix. Prove that A is nilpotent if and only if A is similar 
to an upper-triangular matrix all of whose diagonal entries are 0. 

7. Let A, B € M, (C) be matrices such that AB = BA. 


a) Prove that each eigenspace of B is stable under the linear transformation 
attached to A. 

b) Deduce that A and B have a common eigenvector. 

c) Prove by induction on n that there is an invertible matrix P such that PAP™! 
and PBP™ are both upper-triangular. 


8. Let A, B € M,C) be two matrices. Recall that the Kronecker or tensor product 
of A and B is the matrix A @ B € M„2 (C) defined by 


a,,B an B ee a\,B 
ax B an B Re an B 


A®B= 
An, B an2B e% ann B 
We recall that 


(A 8 B): (A' 8 B’) = (AA’) 8 (BB’) 


for all matrices A, A’, B, B’ € M, (©). 


a) Consider two invertible matrices P, Q such that P~!AP and Q7!BQ are 
upper-triangular. Prove that (P @ Q)~!(A & B)(P ® Q) is also upper- 
triangular and describe its diagonal entries in terms of the eigenvalues of A 
and B. 
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b) Deduce that if 


xa(X) =] [XA ad zX) =] [X - ws) 


i=l i=1 


then 


xaos (X) =| [] [Q - Ain). 


i=l j=l 


9.2 Diagonalizable Matrices and Linear Transformations 


Diagonal matrices are fairly easy to understand and study. In this section we study 
those matrices which are as close as possible to being diagonal: the matrices 
which are similar to a diagonal matrix. We fix a field F. All vector spaces will 
be considered over F and will be finite-dimensional. 


Definition 9.8. a) A matrix A € M,(F) is called diagonalizable if it is similar to 
a diagonal matrix in M, (F). 

b) A linear transformation T : V —> V on a vector space V is called diagonalizable 
if its matrix in some basis of V is diagonal. 


Thus a matrix A € M, (F) is diagonalizable if and only if we can write 
A = PDP“ 


for some invertible matrix P € M, (F) and some diagonal matrix D = [d;;] € M, (F). 
Note that any matrix which is similar to a diagonalizable matrix is itself 
diagonalizable. In particular, if T is a diagonalizable linear transformation, then 
the matrix of T with respect to any basis of V is still diagonalizable (but not 
diagonal in general). 

We can give a completely intrinsic characterization of diagonalizable linear 
transformations, with no reference to a choice of basis or to matrices: 


Theorem 9.9. A linear transformation T : V —> V on a vector space V is 
diagonalizable if and only if there is a basis of V consisting of eigenvectors of T. 


Proof. Suppose that T is diagonalizable. Thus there is a basis v1, ..., vn of V such 
that the matrix A of T with respect to this basis is diagonal. If (a;;)i<i<n are the 
diagonal entries of A, then by definition T(v;) = a;;v; for all 1 < i < n, thus 


V1,...,Vn is a basis of V consisting of eigenvectors for T. 
Conversely, suppose that there is a basis vj,...,v, of V consisting of eigen- 
vectors for T. If T(v;) = divi, then the matrix of T with respect to vj,..., Vn is 


diagonal, thus T is diagonalizable. o 
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Remark 9.10. One can use these ideas to find an explicit way to diagonalize a 
matrix A. If A € M,,(F) is diagonalizable, then we find a basis of V = F” 
consisting of eigenvectors and we let P be the matrix whose columns are this 
basis. Then P~!AP = D is diagonal and A = PDP™!. 


Remark 9.11. Suppose that A is diagonalizable and write A = PDP! for some 
diagonal matrix D and some invertible matrix P. 


a) The characteristic polynomials of A and D are the same, since A and D are 
similar. We deduce that 


[ [& = di) = xa. 


i=l 


In particular, the diagonal entries of D are (up to a permutation) the eigenvalues 
of A (counted with algebraic multiplicities). This is very useful in practice. 

b) Let A be an eigenvalue of A. Then the algebraic multiplicity of A equals the 
number of indices i € [l,m] for which dj; = A (this follows from a)). On 
the other hand, the geometric multiplicity of A as eigenvalue of A or D is the 
same (since X > P~!X induces an isomorphism between Ker(AJ,, — A) and 
Ker(Al, — D), thus these two spaces have the same dimension). But it is not 
difficult to see that the geometric multiplicity of A as eigenvalue of D is the 
number of indices i € [1, n] for which d;; = A, since the system DX = 1X is 
equivalent to the equations (di; — 4)x; = 0 for 1 <i < n. We conclude that for 
a diagonalizable matrix, the algebraic multiplicity of any eigenvalue equals 
its geometric multiplicity. 


Problem 9.12. Show that 


is not diagonalizable when a + 0. 


Solution. Suppose that A is diagonalizable and write A = PDP! with P 
invertible and D diagonal. Since A is upper-triangular with diagonal entries equal 
to 1, we deduce that the eigenvalues of A are equal to 1. By the previous remark 
the diagonal entries of D must all be equal to 1 and so D = J,. But then 
A = PI, P7' = I,, a contradiction. o 


Problem 9.13. Prove that the only nilpotent and diagonalizable matrix A € M, (F) 
is the zero matrix. 


Solution. Suppose that A is diagonalizable and nilpotent and write A = PDP™!. 
By Problem 8.38 and the previous remark we obtain 


n 


X” = ya(X) = | [(X - di). 


i=l 


Thus d;; = 0 for all i and then D = O, and A = PO, P=! = O,. oO 


9.2 Diagonalizable Matrices and Linear Transformations 347 


The study of diagonalizable matrices is more involved than that of trigonalizable 
ones. Before proving the main theorem characterizing diagonalizable matrices, we 
will prove a technical result, which is extremely useful in other situations as well 
(the reader will find two more beautiful applications of this result in the next 
section). 

Let k > 1 be an integer and let P},..., P pairwise relatively prime polynomials 
in F[X]. Denote P = P; ... Px the product of these k polynomials. 


Problem 9.14. Let Q; = Z. Prove that Q1, ..., Ox are relatively prime, i.e., there 
is no nonconstant polynomial Q dividing all Q1,..., Ox. 


Solution. Suppose there is an irreducible polynomial Q that divides Q; for all i. 
Since Q|Q; = P2--+ Px, we deduce that Q divides P; for some j € {2,...,k}. 
But since Q divides Qj, it also divides P; for some i # j, contradicting that P; 
and P; are relatively prime. o 


Note that it is definitely not true that Q1,..., Qg are themselves pairwise 
relatively prime: if k > 2, then both Q; and Q% are multiples of Px. 
The technical result we need is the following: 


Theorem 9.15. Suppose that T is a linear transformation on some F -vector space 
V (not necessarily finite dimensional). Then for any pairwise relatively prime 
polynomials P,,..., Px € F[X] we have 


k 
ker P(T) = Q ker P,(T), 
i=1 
where P = Pı Po... Pr. 


Proof. Consider the polynomials Q; = i as in the previous problem. Since 


they are relatively prime, Bezout’s lemma! yields the existence of polynomials 
R,,..., Rg such that 


Q,R,+...+ QOR = 1 (9.1) 
Since P; divides P, it follows that ker P;(T) C ker P(T) for alli € [1, k]. On 
the other hand, take x € ker P(T) and let x; = (Q; R;)(T)(x). Then relation (9.1) 


shows that 


X=X+xXo+... +X. 


'This lemma says that if A,B € F[X] are relatively prime polynomials, then we can find 
polynomials U,V € F[X] such that AU + BV = 1. This easily yields the following more 
general statement: if P;,..., Pk are polynomials whose greatest common divisor is 1, then we can 
find polynomials U;,..., Ux such that U; Pi +... + Uk Pk = 1. 


348 9 Diagonalizability 


Moreover, P;(T)(x;) = (P;Q;R;)(T)(x) and P; Q; R; is a multiple of P. Since 
x € kerP(T) C ker(P;Q;R;)(T), it follows that x; € ker P;(T), and since 
x = X1 +... + xk, we conclude that 


k 
ker P(T) = ` ker P; (T). 


i=l 


It remains to prove that if x; € ker P;(T) and xı +... + xk = 0, then x; = 0 
for alli € [1, k]. We have 


Q(T) (x1) + O1(T)(X2) +... + O1(T) (xx) = 0. 


But Qı (T)(x2) =... = O1(T)(xx) = 0, since Q; is a multiple of Po,..., Px 
and P,(T)(x2) = ... = Pr(T)(xc) = 0. Thus Qı(T)(xı) = 0 and similarly 
Q;(T)(x;) = 0 for 1 < j < k. But then 


x1 = (RiQi)(T) (1) +... + (Re Qe (TT) (Xx) = 0 
and similarly we obtain x2 = ... = xg = 0. The theorem is proved. oO 


We are now ready to prove the fundamental theorem concerning diagonalizable 
linear transformations. 


Theorem 9.16. Let V be a finite dimensional vector space over F and let 
T : V — V bea linear transformation. The following assertions are equivalent: 


a) T is diagonalizable. 

b) There is a polynomial P € F[X] which splits over F and has pairwise distinct 
roots, such that P(T) = 0. 

c) The minimal polynomial ur of T splits over F and has pairwise distinct roots. 

d) Let Sp(T) C F be the set of eigenvalues of T. Then 


@ ker(T —A-id) = V. 


dESp(T) 


Proof. We start by proving that a) implies b). Choose a basis in which T is 
represented by the diagonal matrix D. Let P be the polynomial whose roots are 
the distinct diagonal entries of D. Then P (T) is represented by the diagonal matrix 
P(D) with entries P(d;;) = 0. Thus P(T) = 0. 

That b) implies c) is clear since the minimal polynomial of T will divide P and 
hence it splits over F, with distinct roots. 

That c) implies d) is just Theorem 9.15 applied to P the minimal polynomial of 
T and P; its linear factors. 

Finally, to see that d) implies a), write Sp(T) = {A1,..., Ax} and choose a basis 
Vi, ---, Vn Of V obtained by patching a basis of ker(T — A, - id), followed by a basis 
of ker(T — A2-id), ..., followed by a basis of ker(T — A, -id). Then vj,..., Vn form 
a basis of eigenvectors of T, thus a) holds by Theorem 9.9. oO 
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Remark 9.17. a) If T is a diagonalizable linear transformation, then example 8.9 
shows that the minimal polynomial of T is 


ur% = [] &-»., 


A€ESp(T) 


the product being taken all eigenvalues of 7, counted without multiplicities. 
Taking the same product, but counting multiplicities (algebraic or geometric, they 
are the same) of eigenvalues this time, we obtain the characteristic polynomial 
of T. 

b) If T is any linear transformation on a finite dimensional vector space V, then T 
is diagonalizable if and only if the sum of the dimensions of the eigenspaces of 
T equals dim V, i.e., 


XO dimker(T — 1 - id) = dim V. 
A€Sp(T) 


Indeed, this follows from the theorem and the fact that the subspaces ker(T —A-id) 
are always in direct sum position. 

c) Suppose that T is diagonalizable. For each A € Sp(T) let m, be the projection 
on the subspace ker(T — À - id). Then 


T= > Am. 
AESp(T) 
This follows from ®jespcr) ker(T — A - id) = V and the fact that if 


v= >> v, with v, €ker(T —2- id), 
A€Sp(T) 


then 


Tw = Yo Tm)= YO n= YO Am). 


2ESp(T) 2ESp(T) AESp(T) 


Due to its importance, we will restate the previous theorem in terms of matrices: 
Theorem 9.18. Let A € M,,(F). Then the following assertions are equivalent: 


a) A is diagonalizable in M,(F). 
b) If Sp(A) is the set of eigenvalues of A, then 


@ keral- A) = F”. 
A€Sp(A) 


c) The minimal polynomial 4 of A is split over F, with pairwise distinct roots. 
d) There is a polynomial P € F[X] which is split over F, with pairwise distinct 
roots and such that P(A) = On. 
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In the following problems the reader will have the opportunity to check the 
comprehension of the various statements involved in the previous theorem. 


Problem 9.19. Explain why the matrix A with real entries is diagonalizable in each 
of the following two cases. 


(a) The matrix A has characteristic polynomial 


X? — 3X? 42x. 
(b) 


10000 
03000 
A=]00400 
00032 
00014 


Solution. (a) We have 
X? — 3X? 42K = X(X? — 3X + 2) = X(X — 1)(X — 2), 


which is split, with distinct roots. Since this polynomial kills A (by the Cayley— 
Hamilton theorem), the result follows from the implication “b) implies a)” 
in Theorem 9.16. We can also argue directly, as follows: if vj, v2, v3 are 
eigenvectors corresponding to the eigenvalues 0, 1, 2, then vı, v2, v3 are linearly 
independent (since the eigenvalues are distinct) and thus must form a basis 
of R°. Thus A is diagonalizable (by Theorem 9.9 and the discussion preceding 
it). 
(b) We have 


A(X) = det(XIs — A) = (X — 1)(X — 3)(X — A(X — 3)(X — 4) - 2] 
= (X—1)(X—3)(X—4)(X?—7X +10) = (X —1)(X —2)(X —3)(X —4)(X —5). 


This polynomial is split with distinct roots, so the same argument as in part a) 
yields the result. o 


Problem 9.20. Consider the matrix 


010 
A=]001 
100 


a) Is A diagonalizable in M3(C)? 
b) Is A diagonalizable in M3(R)? 
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Solution. One easily finds that the characteristic polynomial of A is 74(X) = 
X3—1. This polynomial is split with distinct roots in C[X], thus A is diagonalizable 
in M3(C). On the other hand, A is not diagonalizable in M3(R), since its 
characteristic polynomial does not split in R[X]. Oo 


Problem 9.21. Let 


-7-16 4 
A=] 6 13 —2] € M3(R) 
12 16 1 


(a) Prove that A = 5 is an eigenvalue of A. 
(b) Diagonalize A, if possible. 


Solution. (a) We have 


—12-16 4 
A-SI=]| 6 8 -2 
12 16 —4 


and the last row is the opposite of the first row. Thus A — 5/ is not invertible 
and 5 is an eigenvalue of A. 

(b) We take advantage of part a) and study the 5-eigenspace of A. This is described 
by the system of equations 


—12x — 16y + 4z=0 
6x + 8y —2z=0 
12x + loy —4z =0 
As we have already remarked in part a), the first and the third equations are 
equivalent. The system is then equivalent (after dividing the first equation by 4 
and the second one by 2) to 
—3x-—4y+z=0 
3x +4y-z=0 
Again, the first and second equations are equivalent. Thus the 5-eigenspace is 
ker(A — 5/) = {(x, y, 3x + 4y)|x, y € R} 
and this is a two-dimensional vector space, with a basis given by 


vı = (1,0,3), v2 = (0, 1,4). 


We deduce that 5 has algebraic multiplicity at least 2. Since the sum of the 
complex eigenvalues of A equals the trace of A, which is —7 + 13 + 1 = 7, we 
deduce that —3 is another eigenvalue of A, and the corresponding eigenspace is a 
line. Solving the system AX = —3X yields the solution (—2, 1,2). We deduce that 
a diagonalization of A is given by 
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109/750 077107 
AS O18 | losolloril . o 


34 2 00-3] L34 2 


Problem 9.22. Consider the matrix 


010 
A=|-440]| € M3(R). 
Se 


Is this matrix diagonalizable? 


Solution. We start by computing the characteristic polynomial 


X -1 0 E 
xa(X)=|4 X—4 0 =«-9|7 ae 
E D 


(X —2)(X? —4X +4) = (X —2)°. 


Thus 2 is an eigenvalue of A with algebraic multiplicity 3. If A is diagonalizable, 
then 2 would have geometric multiplicity 3, that is Ker(A — 2/3) would be three 
dimensional and A = 2J}. Since this is certainly not the case, it follows that A is 
not diagonalizable. o 


Problem 9.23. Find all values ofa € R for which the matrix 


21-2 
A= |1a-—1 |€ MR) 
11-1 


is diagonalizable. 


Solution. As usual, we start by computing the characteristic polynomial 7 4(X) 
of A. Adding the first column to the third one, then subtracting the first row from 
the third one, we obtain 


X-2 -l 2 
xA(X) =] -1 X-a 1 = 
-1 -1 X41 


Ka al x 
-1 X¥-a0|/=X(X-1)(X-a). 
1-Xx 0 0 


Ifa ¢ {0,1}, then y4(X) is split with distinct roots and since it kills A (by the 
Cayley—Hamilton theorem), we deduce that A is diagonalizable. 
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Suppose that a = 0, thus 0 is an eigenvalue of A with algebraic multiplicity 2. 
Let us find its geometric multiplicity, which comes down to solving the system 
AX = 0. This system reads to 


2x, + x2 — 2x3 = 0 
XxX, -x%3 =0 
xX, +x x3 =0 


and its solutions are (x;,0,x,) with x; € R. As this space is one dimensional, we 
deduce that the geometric multiplicity of 0 is 1 and so A is not diagonalizable. 

If a = 1, a similar argument shows that 1 is an eigenvalue with algebraic 
multiplicity 2 and with geometric multiplicity 1, thus A is not diagonalizable. All in 
all, the answer of the problem is: all a € R \ {0, 1}. o 


Problem 9.24. Diagonalize, if possible, the matrix 
40-2 


A=|25 4 |e MR) 
00 5 


Solution. We start by computing the characteristic polynomial of A: 
X—4 0 2 


27 X5 4 =(x-5)| 
0 0 X-5 


X—4 0 


n — _ 2. 
ee 4)(X — 5}. 


We deduce that A has two eigenvalues, namely 4 with algebraic multiplicity 1 
and 5 with algebraic multiplicity 2. Next, we study separately the corresponding 
eigenspaces. Since 4 has algebraic multiplicity 1, we already know that the 4- 
eigenspace will be a line. To find it, we write the condition AX = 4X as the system 


4x — 2z = 4x 
2x + 5y + 4z = 4y 
5z = 4z 


This system can easily be solved: the last equation gives z = 0, the first one becomes 
tautological and the second one gives y = —2x. Thus the 4-eigenspace is the line 
spanned by vı = (1, —2, 0). 

Next, we study the 5-eigenspace. Write the equation AX = 5X as the system 


4x — 2z = 5x 
2x + 5y + 4z = 5y 
5z = 5z 
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The last equation is tautological. The first equation gives x = —2z and the second 
one becomes then tautological. Thus the solutions of the system are (—2z, y,z) = 
y(0, 1,0) + z(—2,0, 1) with y,z € R?. We deduce that the 5-eigenspace is two- 
dimensional, with a basis given by v2 = (0, 1,0) and v3 = (—2,0, 1). 

Since the sum of the dimensions of the eigenspaces equals 3 = dim R, we 
deduce that A is diagonalizable and v),v2,v3 form a basis of eigenvectors. The 
matrix P whose columns are the coordinates of v1, v2,v3 with respect to the 
canonical basis is 


1 0-2 
P=|-21 0 
0 0 
We have 
400 
A= PDP, with D=|050}. o 
005 


We end this section with some more theoretical exercises. 


Problem 9.25. Let T be a diagonalizable linear transformation on a finite dimen- 
sional vector space V over a field F. Let W be a subspace of V which is stable 
under T. Prove that T|w : W — W is diagonalizable. 


Solution. Since T : V — V is diagonalizable, there is a polynomial P € F[X] of 
the form P = (X —A,)...(X —A,) with A,,...,A, € F pairwise distinct, such 
that P(T) = 0. Since P(T)(v) = 0 for all v € V, we have P(T)(w) = 0 for all 
w € W. Thus P(T|w) = 0 and so T |w is diagonalizable by Theorem 9.16. o 


The result established in the next problem is very useful in many situations. 


Problem 9.26. Let V be a finite dimensional vector space over a field F and let 
Ti, Ta : V — V be linear transformations of V. Prove that if 7; and 7, commute, 
then any eigenspace of T) is stable under 7). 


Solution. Let à € F be an eigenvalue of T and let E} = ker(A - id — T2) be the 
corresponding eigenspace. If v € E}, then 7>(v) = Av, thus 


PTO) = NM) = Tiv) = AT\(v) 


and so Tı (v) € E,. The result follows. oO 


Problem 9.27. Let V be a finite dimensional vector space over a field F and let 
Ti, 12: V — V be diagonalizable linear transformations of V. Prove that 7; and 
T) commute if and only if they are simultaneously diagonalizable. 
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Solution. Suppose first that 7; and 7> are simultaneously diagonalizable. Thus 
there is a basis B of V in which the matrices of T; and T, are diagonal, say Dı 
and D2. We clearly have Dı D2 = D2Dy, thus the matrices of T; o T) and T; o T; 
coincide in this basis and so Tı o Ty = TP 0 T). 

Conversely, suppose that T; and T) commute. Let Aj,...,A,% be the distinct 
eigenvalues of Tı and let W; = ker(7, — 4;) be the corresponding eigenspaces. 
Since Tı is diagonalizable, we have V = W, @... ® Wx. Since T; and T, 
commute, T> leaves each W; invariant by Problem 9.26. Since T, is diagonalizable, 
so is its restriction to W;, by Problem 9.25. Thus there is a basis B; of W; 
consisting of eigenvectors for T>|w,. Consider the basis B’ consisting of all vectors 
in Bı U...U Bx. Then B’ consists of eigenvectors for both T; and T> (this is clear 
for T>, and holds for T; since T; acts on W; by the scalar A;). Thus the matrices of 
Tı and T; in the basis B’ are both diagonal and the result follows. oO 


Problem 9.28. Let A be an invertible matrix with complex coefficients and let 
d > 1. Prove that A is diagonalizable if and only if A is diagonalizable. What 
happens if we don’t assume that A is invertible? 


Solution. Suppose that A is diagonalizable, thus there is an invertible matrix P such 
that PAP! is a diagonal matrix. Then (PAP~!)¢ = PA? P7! is also a diagonal 
matrix, hence A“ is diagonalizable. This implication does not require the hypothesis 
that A is invertible. 

Suppose now that Af is diagonalizable and that A is invertible. Since A? is diag- 
onalizable and invertible, its minimal polynomial is of the form (X —å1) . .. (X —A,x) 
with A,,...,A% pairwise distinct and nonzero. Consider the polynomial P(X) = 
(X? — 11)... (X1 — åz). Since each of the polynomials X" — A,,..., X7 — Ax has 
pairwise distinct roots and since these polynomials are pairwise relatively prime, 
their product P has pairwise distinct roots. Since P(A) = 0, we deduce that A is 
diagonalizable. 

Finally, if we only assume that A? is diagonalizable and A is not invertible, then 
one of the eigenvalues of A is 0. Hence one of the factors of the matrix P(X) above 
becomes X“. Since this does not have distinct roots the proof breaks down. Indeed 
A need not be diagonalizable in this case. For instance, consider the matrix A = 


k al This matrix satisfies A? = 0, thus A? is certainly diagonalizable. However, 


A is not diagonalizable. Indeed, if this was the case, then A would necessarily be the 
zero matrix, since its eigenvalues are all 0. Hence for the more difficult implication 
one cannot drop the hypothesis that A is invertible. oO 


Problem 9.29. Let A be a matrix with real entries such that A? = A?. 


a) Prove that A? is diagonalizable. 
b) Find A if its trace equals the number of columns of A. 


Solution. a) The hypothesis yields A* = A? = A’, thus (4°)? = A?. It follows 
that A? is killed by the polynomial X(X — 1), which has pairwise distinct and 
real roots. Thus A? is diagonalizable. 
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b) Letn be the number of columns of A. The trace of A equals n, and this is also the 
sum of the complex eigenvalues of A, counted with their algebraic multiplicities. 
By hypothesis each eigenvalue A satisfies A> = A’, thus A € {0,1}. Since the 
n eigenvalues add up to n, it follows that all of them are equal to 1. Thus all 
eigenvalues of A? are 1 and using part a) we deduce that A? = J,,. Combining 
this with the hypothesis yields A - I, = J, and then A = [,, which is the unique 
solution of the problem. 

oO 


Problem 9.30. Let A; € M,(F) and A2 € M,(F) and let 


A, 0 
Hz | i A EM: 


Prove that A is diagonalizable if and only if A; and A are diagonalizable. 


Solution. If P € F[X] is a polynomial, then 


P(A) 0 | 


is | 0 P(A) 


If A is diagonalizable, then we can find a polynomial P which splits over F into a 
product of distinct linear factors and which kills A. By the previous formula, P also 
kills A; and A, which must therefore be diagonalizable. 

Suppose now that A; and A> are diagonalizable, thus we can find polynomials 
Pı, P which split over F into products of distinct linear factors and which kill A; 
and A) respectively. Let P be the least common multiple of Pı and P2. Then P splits 
into a product of distinct linear factors and kills A, which is therefore diagonalizable. 

An alternative solution is based on the study of eigenspaces of A. Namely, it is 
not difficult to see that for any A € F we have 


ker(A — AI p44) = ker(A1 — Alp) © ker(Az — À Iq). 


Now a matrix X € M,(F) is diagonalizable if and only if ®yer ker(X — AI,) = 
F”, from where the result follows easily. o 


9.2.1 Problems for Practice 


1. a) Diagonalize the matrix 


in M: 2 (C). 
b) Do the same by considering A as an element of M/>(R). 
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2. For each matrix A below, decide if A is diagonalizable. Explain your reasoning. 
If A is diagonalizable, find an invertible matrix P and a diagonal matrix D such 
that P™!AP = D. 


(a) 


500 
A= |050] € M3(R) 
105 


(b) 


300 
A=|050| € MR) 
105 


3. a) Let aj,...,a, be complex numbers and let A = [a;a;]i<i jsn € M,(C). 
When is A diagonalizable? 
b) Let a),...,a, be real numbers and let A = [a;4;]1<;,j;<, € M, (R). When 
is A diagonalizable? 
4. Let A be the n x n matrix all of whose entries are equal to 1. Prove that A € 
M,,(R) is diagonalizable and find its eigenvalues. 
5. Compute the nth power of the matrix 


133 
A=] 313 
331 


Hint: diagonalize A. 
6. Find all differentiable maps x, y,z : R — R such that x(0) = 1, y(0) = 0, 
z(0) = 0 and 


xX =y+z, y =x+z, Z =x—3y +4. 


0 11 
Hint: the matrix A = | 1 01 | has an eigenvalue equal to —1. Use this to 
1-34 


diagonalize A. How is this related to the original problem? 
7. Let V be a finite dimensional vector space over C and let T : V —> V bea 
linear transformation. 


a) Prove that if T is diagonalizable, then T? is diagonalizable and 
ker T = kerT?. 

b) Prove that if T? is diagonalizable and ker? = kerT*, then T is 
diagonalizable. 
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10. 


11. 


12. 


13. 
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. Let A, B € M,(F) be matrices such that A is invertible and AB is diagonal- 


izable. Prove that BA is also diagonalizable. What happens if we don’t assume 
that A is invertible? 


. Find all matrices A € M3(R) such that 


900 
A =1]140 
111 


900 
Hint: start by diagonalizing the matrix | 1 4 0 | and prove that any solution of 
111 
the problem is diagonalizable and commutes with this matrix. 
Let A € M,(C) be a matrix such that Af = I, for some positive integer d. 
Prove that 


a) A is diagonalizable with eigenvalues dth roots of unity. 
b) Deduce that 


d 
dim ker(A — I„) = 3 yo Tr(4‘). 


i=l 


Let F be an arbitrary family of diagonalizable matrices in M, (C). Suppose that 
AB = BA for all A, B € F. Prove that there is an invertible matrix P such 
that PAP! is diagonal for all A € F. Hint: proceed by induction on n and use 
Problem 9.26 and the arguments in the solution of Problem 9.27. 

(Functions of a diagonalizable matrix) Let A € M,,(F) be a diagonalizable 
matrix with A = PDP™ and D diagonal with diagonal entries d;;. Let f : 
F — F be any function and let f(D) be the diagonal matrix with (i, i) entry 
f (dij). Define f(A) = Pf(D)P™. 


a) Prove that f(A) is well defined. That is, if we diagonalize A in a different 
way, we will get the same matrix f(A). (Hint: there is a polynomial p with 
P(dii) = fdii).) 

b) Prove that if A is diagonalizable over F = R and m is odd, then there is a 
diagonalizable matrix B with B” = A. 


Let A, B € M,(R) be diagonalizable matrices such that 
AB?’ = BPA, 


Write B = PDP™ with P invertible and D diagonal. 


a) Let C = P~!AP. Prove that CD? = D°C. 
b) Prove that CD = DC. Hint: use the injectivity of the map x —> x? (x € R). 
c) Deduce that AB = BA. 
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14. Let A € M,,(C) and let B = { À € Mn (C). 


a) Prove that for all P € C[X] we have 
_ | P(A) AP'(A) 
a= | 0 P(A) l 


b) Deduce that B is diagonalizable if and only if A = O,. 


15. Find all matrices A € M,(R) such that A> = A? and the trace of A equals n. 
Hint: prove that all complex eigenvalues of A are equal to 1 and then that A is 
diagonalizable in M,,(C). 

16. Let A, B € M,,(R) be diagonalizable matrices such that A = B°. Prove that 
A = B. Hint: use problems 13 and 9.27. 

17. Let V be a finite dimensional C-vector space and let T : V —> V be a linear 
transformation such that any subspace of V which is stable under T has a 
complement which is stable under T. Prove that T is diagonalizable. 

18. Let V be a finite dimensional vector space over some field F and let T : V > 
V be a diagonalizable linear transformation on V. Let C (T) be the set of linear 
transformations S : V —> V such that SoT =T oS. 


a) Prove that a linear transformation S : V —> V belongs to C (T) if and only 
if S leaves invariant each eigenspace of T. 

b) Let m, be the algebraic multiplicity of the eigenvalue A of T. Prove that 
C(T) is an F-vector space of dimension `, m2, the sum being taken over 
all eigenvalues À of T. 

c) Suppose that the eigenvalues of T are pairwise distinct. Prove that 
id, T,T2,..., T”! form a basis of C(T) as F-vector space. 


9.3 Some Applications of the Previous Ideas 


In this section we would like to come back to the technical result given by 

Theorem 9.15 and give some further nice applications of it. First of all, we will apply 

it to the resolution of linear differential equations with constant coefficients. 
Consider the following classical problem in real analysis: given complex num- 


bers do, 41, . . . , &n—1 and an open interval J in R, find all smooth functions f :I—C 
such that 
SO +anif? MC) +... + af’) + aof(x) =0 (9.2) 


for all x € J. Here f“ is the ith derivative of f. 
It follows from elementary calculus that any solution of Eq. (9.2) is smooth, 
i.e., infinitely differentiable. Let V be the space of smooth functions f : I > C. 
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Note that V is an infinite dimensional vector space over C, but we had no finiteness 
assumption in Theorem 9.15, so we can use it for this vector space. Consider the 
linear transformation T sending a map f in V to its derivative 


T:V >V, T(= f. 
Then T*(f) = f for all k > 0, thus solving Eq. (9.2) is equivalent to finding 
ker P(T), where 
P(X) = X" + dpa kX” +... 40% 


is the characteristic polynomial of Eq. (9.2). Since we work over the complex 
numbers, we can factor 


d 
P(X) = | [E -2 


i=l 


for some positive integers k,,...,kq and some pairwise distinct complex numbers 
Z1,---,Zq- By Theorem 9.15 we have 


d 
ker P(T) = Q ker(T — zi - id) 


i=1 


so it suffices to understand ker(T — z- id)*, where z is a complex number and k is a 
positive integer. Let g € V be the map 


g(x) = e”, 


so that g’ = zg. Then for any f € V we have 


(T —z-id)(fg) = (Jg) —zfg = f'g, 


thus by an immediate induction 


(T —z-id)*(fg) = fs. 


Take h € ker(T — z-id)* and let f = h/g (note that g has no complex zero). Then 
the previous computation gives f) = 0, that is f is a polynomial map of degree 
less than k. Conversely, the same computation shows that any such f gives rise to 
an element of ker(T — z - id)* (if multiplied by g). We conclude that ker(T — z- id)* 
consists of the maps x > g(x) P(x), with P a polynomial of degree < k — 1 with 
complex coefficients, a basis of ker(T —z-id)* being given by the maps x +> xie”, 
where 0 < j <k—-1. 
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Putting everything together, we obtain the following: 


Theorem 9.31. Let do,...,@,—1 be complex numbers and write 


d 
xX” + an1 X”! + AA + ao = [[« — z)". 


i=1 


a) The complex-valued solutions of the differential equation 
fO +a f+... +a f taf =0 


are the maps of the form 


d 
r fe) = De Pi(x), 


i=l 


where P; is a polynomial with complex coefficients whose degree does not exceed 
ki- 1. 

b) The set of complex-valued solutions of the previous differential equation is a 
vector space of dimension n = kı + ... + ka over C, a basis being given by the 
maps x +> x/e#*, where 1 <i <dand0 < j < ki. 


We consider now the discrete analogue of the previous problem, namely linear 
recurrence sequences. Let ao, ...,aq—ı be complex numbers and consider the set 
S of sequences (Xn )n>0 of complex numbers such that 


Xn+d = A0Xn + A1Xn+1 +... F Ad-1Xn+d-1 
forall n > 0. 
First of all, it is clear that an element of S is uniquely determined by its first d 
terms xo, . . . , Xq—1. In other words, the map 


S> Cİ, (Xn)nz0 > (Xo, X1; -< Xd—1), 


which is clearly linear, is bijective and so an isomorphism of vector spaces. We 
deduce that 


dimS = d. 
We would like to describe explicitly the elements of S. We proceed as above, 
by working in the big space V of all sequences (Xn)n>0 of complex numbers and 


considering the shift map 


TiV>V, T(Qa)azo) = Ont nc: 
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Note that T is clearly a linear map and that 
T*((Xn)nz0) = Cn+k)nz0- 
It follows that 
S=kerP(T), where P(X) = Xe ay tO Se 


is the characteristic polynomial of the recurrence relation. As before, factorizing 


p 
PT) =] [X -2)* 


i=l 


we obtain using Theorem 9.15 that 


P 
S = Dker(T — z id)" 


i=l 


and so the problem is reduced to understanding the space ker(T — z- id)* where z is 
a complex number and k is a positive integer. 

Let us start with the case z = 0, i.e., understanding ker T*. We have 
Tk ((Xn)n>0) = O if and only if x,+, = 0 for all n > 0, i.e., the sequence 
Xo,X1,... becomes the zero sequence starting with index k. A basis of ker TK is 
given by the sequences x, ..., x), where x) is the sequence whose jth term 
is | and all other terms 0. 

Assume now that z 4 0. Let x = (Xn)n>0 be any sequence in V and define a new 
sequence y = (Yn )n>0 by 


Xn 


ee 


for n > 0. One can easily check by induction on j that 
(T —z-id)! (x) = Z (T — id)! Y )n)nz0; 


where (T — id)/(y), is the nth component of the sequence (T — id)/ (y). It follows 
that 


x €ker(T —z-id)* ifandonlyif y € ker(T — id). 
We are therefore reduced to understanding Ker(T — id)". If k = 1, a sequence 


X = (Xn)nz0 is in Ker(T — id)* if and only if X,41 — Xn = Oforn > 0,ie., x isa 
constant sequence. If k = 2, a sequence x = (X,)n>0 is in Ker(T — id)* if and only 
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if (T — id)(x) is a constant sequence, i.e., the sequence (X41 — Xn )n>0 is constant, 
that is x is an arithmetic sequence or equivalently 


Xn =an+b 


for some complex numbers a, b. In general, we have 


Proposition 9.32. [fk is a positive integer, then ker(T —id)* is the set of sequences 
of the form 


Xn = dao +aın +... + ap—ın*7! 


with ao, ...,aķ—ı € C, a basis of it being given by the sequences 
x) = (n!)n>0 


for 0 < j < k — 1 (with the convention that 0° = 1). 


Proof. It suffices to prove that the sequences x,...,x“—") form a basis of 
ker(T — id)‘. 

First, we prove that x 2. x4) are linearly independent. Indeed, if not, then 
we can find complex numbers uo, . . . , uk—1, not all 0, such that for all n > 0 


uo + un +... + Tra ae = 0. 


The polynomial uo + u1 X +...-++-uz—,X*~! is then nonzero and has infinitely many 
roots, a contradiction. 

Next, we prove that x) € ker(T — id)* for 0 < j < k — 1, by induction on k. 
This is clear for k = 1, and assuming that it holds for k — 1, it suffices (thanks to 
the inductive hypothesis and the inclusion ker(T — id)‘~! C ker(T — id)*) to check 
that x“—) € ker(T — id)‘, or equivalently that 


(T —id)x®—) = ((n + D4! — naz € ker(T — id). 
But the binomial theorem shows that ((n + 1)'—! —n‘!),, 59 is a linear combination 
of x, ...,x*-?), which all belong to ker(T — id)*—! by the inductive hypothesis, 
hence the inductive step is proved. 
To conclude, it suffices to prove that dim ker(T — id)" < k for all k, which we 


do again by induction on k. This has already been seen for k = 1, and if it holds for 
k — 1, then the rank-nullity theorem applied to the map 


T — id : ker(T — id)‘ — ker(T — id)*! 
yields 


dim ker(T — id)‘ < dimker(T — id) + dimker(T — id)*!. 
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Now dimker(7T — id) < 1 and by induction dimker(T — id)‘~! < k — 1, hence 
dim ker(T —id)* < k and the inductive step is completed. The proposition is finally 
proved. o 


We can now put everything together and the previous discussion yields the 
following beautiful: 


Theorem 9.33. Let do,...,aq—, be complex numbers. Consider the polynomial 


p 
P(X) = X4 — a41 X! —...— a = [[~« =z)" 


i=l 


and assume for simplicity that ay 4 0, so that all z; are nonzero. 
Let S be the set of sequences (Xy)n>0 of complex numbers such that 


Xn+d = AXn + 41Xn+1 +... + Ad-1Xn+d-1 


for alin = 0. 


a) A sequence (Xn)n>0o is in S if and only if there are polynomials Q; with complex 
coefficients, of degree not exceeding ki — 1, such that for all n 


Xn = Qi(njzi +... + Qp(n)z,. 


b) S is a vector space of dimension d over C, a basis being given by the sequences 
(zi )i<i<p.0<j <ki- 


We promised that we will use the ideas developed in this chapter to give a 
very natural and simple proof of the Cayley—Hamilton theorem for matrices with 
complex entries. It is now time to honor our promise! We will need some topological 
preliminaries, however... 

A sequence of matrices (Ak)k>0o in M,(C) converges to a matrix A € M,,(C) 
(which we denote by A, — A) if for all i, j € [l,m] the sequence with general 
term the (i, j )-entry of A; converges (as a sequence of complex numbers) to the 
(i, j )-entry of A. Equivalently, the sequence (A; )x>0 converges to A if for all £ > 0 
we have 


max |(Aj)i; — Aij| < € 
1l<i,j<n 


for all k large enough (depending on £). We leave it to the reader to check that if 
A, — A and B, —> B, then Ay + B —> A+ B and A, - Bg —> A- B. Finally, 
a subset S of M,,(C) is dense in M,,(C) if for any matrix A € M,,(C) there is a 
sequence of elements of S which converges to A. That is, any matrix in M,,(C) is 
the limit of a suitable sequence of matrices in S. 

The following fundamental result makes the importance of diagonalizable 
matrices fairly clear. 


9.3 Some Applications of the Previous Ideas 365 


Theorem 9.34. The set of diagonalizable matrices in M,(C) is dense in M,(C). 
In other words, any matrix A € M,,(C) is the limit of a sequence of diagonalizable 
matrices. 


Proof: Suppose first that T is an upper-triangular matrix with entries t;;. Consider 
the sequence Tk of matrices, where all entries of 7, except the diagonal ones are 
equal to the corresponding entries in T, and for which the diagonal entries of Tẹ are 
tii + t Clearly, limp—+o0 Tk = T. We claim that if k is large enough, then Tẹ is 
diagonalizable. It suffices to check that the eigenvalues of Tẹ are pairwise distinct. 
But since Tọ is upper-triangular, its eigenvalues are the diagonal entries. Thus it 
suffices to check that for k large enough the numbers ¢;; + E, too + D -s bnn + E 
are pairwise distinct, which is clear. 

Now let A € M,,(C) be an arbitrary matrix. By Corollary 9.5 we can write 
A = PTP™ for some invertible matrix P and some upper-triangular matrix T. 
By the previous paragraph, there is a sequence D% of diagonalizable matrices which 


converges to T. Then D; := PDP Zl is a sequence of diagonalizable matrices 
which converges to A. Thus any matrix is a limit of diagonalizable matrices and the 
theorem is proved. o 


Remark 9.35. a) We can restate the theorem as follows: given any matrix A = 
[aij] € M,(C) and any £ > 0, we can find a diagonalizable matrix B = [b;;] € 
M, (C) such that 


j — dj;| <e. 
ee laij — bij| < € 


b) This result is completely false over the real numbers: the diagonalizable matrices 
in M, (R) are not dense in M, (R). The reason is that the characteristic polyno- 
mial of a diagonalizable matrix is split. One can prove that if lim,—+o. A, = A 
and A,, is diagonalizable for all n, then the characteristic polynomial of A is split. 
Conversely, if this happens, then A is trigonalizable in M,,(R) and the proof of 
the previous theorem easily yields that A is a limit of diagonalizable matrices 
in M,(R). We deduce that the trigonalizable matrices in M,,(R) are precisely 
the limits of sequences of diagonalizable matrices in M,,(R). In other words, a 
matrix A € M,,(R) is trigonalizable if and only if it can be approximated to any 
precision by a diagonalizable matrix in M, (R). 


Using the previous theorem, we can give a very simple and natural proof of the 
Cayley—Hamilton theorem. 


Theorem 9.36 (Cayley—Hamilton). For any matrix A E€ M,(C) we have x 4(A) = 
On, that is A is annihilated by its characteristic polynomial. 


Proof. If A is diagonal, the result is clear: if a1, ..., an are the diagonal entries of 
A, then y4(X) = (X — a1)... (X — an) and clearly this polynomial annihilates A 
(since it vanishes at d,,..., an). 
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Next, suppose that A is diagonalizable. Thus we can write A = PDP for an 
invertible matrix P and a diagonal matrix D. Since A and D are similar, we have 
XA = Xp. So we need to check that yp(A) = On. But 


XD(A) = 4n(P DP") = Pyp(D)P* = 0,, 


the last equality being a consequence of the first paragraph and the equality 
x¥p(PDP™) = Pxp(D)P~ being a consequence of linearity and of the equality 
(PDP—'!)* = PD* P—' for all k > 0. 

Finally, let A € M,(C) be arbitrary. By Theorem 9.34 there is a sequence 
(Ax )x>1 Of diagonalizable matrices such that A, —> A. The coefficients of y 4, 
are polynomial expressions in the coefficients of Ax, and since limg_,99 Ag = A, it 
follows that the coefficient of X@ in x A, converges to the coefficient of X d in ya 
for all d < n. Now write 


Xa (X) = aolk) tay(kK)X +... +4n(K)X",  ya(X) = ag ta, X +... +4, X". 
By the previous paragraph we know that 
ag(k)In + ay (kK) Ak +... + an (k) A} = On 


for all k. Passing to the limit and using the fact that Ai, — A! and a;(k) > a; for 
all i > 0, we deduce that 


Xa(A) = aoln + ajA + MS + a, A” = 
Jim (ao(k)In + ar(k)Ak +... + Gn(k)AZ) = On, 
saree 


finishing the proof of the theorem. oO 


Remark 9.37. a) The second half of the proof of the previous theorem essentially 
proves that if a polynomial equation on C’? holds on a dense subset, then it holds 
everywhere. The reader is strongly advised to convince himself that he can adapt 
the argument to prove this very useful result. 

b) In fact by using some deep facts from algebra one can show that the Cayley— 
Hamilton theorem for the field C just proven implies the Cayley—Hamilton 
theorem over an arbitrary field. One first needs to know that one can choose 
n? elements Xij of C such that there is no polynomial equation with integer 
coefficients (in n? variables) satisfied by the x;; (this is a generalization of a 
transcendental number which is a number that satisfies no polynomial equation 
with integer coefficients). Then since the Cayley—Hamilton theorem holds for the 
matrix with entries x;;, we conclude that each coefficient of the Cayley-Hamilton 
theorem gives a polynomial identity in n? indeterminates which holds over the 
integers. Second, one needs to know that for any field F and any n? elements a; j 
of F there is a morphism (a map respecting addition and multiplication) from 
Z[x11,---,Xnn] to F taking x;j to a;;. Thus each coefficient also vanishes in F 
for the matrix A = (a;j) and the Cayley—Hamilton theorem holds for F. 
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We will end this chapter by explaining how we can combine the ideas seen so far 
with Jordan’s Theorems 6.40 and 6.41 to obtain a classification up to similarity of 
all matrices A € M,,(C) (Theorems 6.40 and 6.41 classified nilpotent matrices up 
to similarity). 

Suppose that V is a finite dimensional vector space over a field F and that T : 
V — V isa trigonalizable linear transformation on V. Recall that this is equivalent 
to saying that the characteristic polynomial of T is split over F. For instance, if 
F = C, then any linear transformation on V is trigonalizable. Let 


d 


xr(X) = | [2 -a/)* 


i=l 


be the factorization of the characteristic polynomial of T, with À1,..., A4 € 
F pairwise distinct and kı,...,kaą positive integers. Thus k; is the algebraic 
multiplicity of the eigenvalue A;. 

By the Cayley—Hamilton theorem y7(T) = 0, thus Theorem 9.15 yields 


d 
V = @Dker(T — A; id“. 


i=l 
We call the subspace 
C; = ker(T — A; - id)* 


the characteristic subspace of A;. Note that the A;-eigenspace is a subspace of C; 
and that the previous relation can be written as 


Since T commutes with (T — A; - id)*', T leaves invariant C; = ker(T — A; - id), 
thus each characteristic subspace C; is stable under T. 

Let 7; be the restriction of T — A; - id to C;. By definition, T“ = 0, thus 7; isa 
nilpotent transformation on C;, of index not exceeding k;. Thus 7; is classified up 
to similarity by a Jordan matrix, that is there is a basis 6; of C; in which the matrix 
of T; is 


Jy, O ... 0 
O Jirre 0 


0 0.. 
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for a sequence ki; >... > kn i of positive integers adding up to dim C;. We recall 
that 


010...0 
001...0 
Tirana EE 
000...1 
000...0 


is the Jordan block of size k. 
Definition 9.38. If A € F, we let 


Ji (A) =i. In a Jn € M,(F) 


the Jordan block of size n associated with 1. 
The previous discussion naturally leads to 


Theorem 9.39 (Jordan). Let T : V — V be a trigonalizable linear transforma- 
tion on a finite dimensional vector space. Then there is a basis of V in which the 
matrix of T is of the form 


JaA) 0 ... 0 
O Jke)... 0 
0 O ... Sky (Aa) 
for some positive integers k,,...,kq adding up ton and some À;,..., àq E€ F. 


Proof. With notations as above, we found a basis 6; of C; in which the matrix of 
the restriction of T to C; is Jaimc, (Ai). Patching these bases 6; yields a basis of V 
in which the matrix of T has the desired form. oO 


We can restate the previous theorem in terms of matrices: 


Theorem 9.40 (Jordan). Any trigonalizable matrix A € M,,(F) is similar to a 
matrix of the form 


JaA) 0 ... 0 
O Jel)... 0 
0 0 1... Jey Aa) 


for some positive integers k,,...,kq adding up ton and some À;,..., àq E€ F. 
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The story isn’t quite finished: we would like to know when two block-diagonal 
matrices as in the theorem are similar, in other words we would like to know if 


A1,...,Aq and ky,...,kq are determined by the similarity class of the matrix 
JaA) 0  ... 0 
O Jk,(A2)... 0 
: : oe : 
0 0 ... Jk (Aa) 


Suppose that A is a matrix similar to the matrix (x). Then the characteristic 
polynomial of A is 


d 


xaX) = | [x AD). 


i=1 
Now, since J, is nilpotent we have vy ,(X) = X” and so 


Xn (X) = (X — A)". 


It follows that 


d 
xaX) = | [X -a* 


i=l 


and so necessarily À1, . . . , Aq are all eigenvalues of A. Note that we did not assume 
that 4,,...,Aq are pairwise distinct, thus we cannot conclude from the previous 
equality that k,,...,kq are the algebraic multiplicities of the eigenvalues of A. This 
is not true in general: several Jordan blocks corresponding to a given eigenvalue 
may appear. The problem of uniqueness is completely solved by the following: 


Theorem 9.41. Suppose that a matrix A E€ M,,(F) is similar to 


Ji Ar) 0 Soret} 0 
0 Jke)... 0 
0 0 1... Ie, Aa) 
for some positive integers k,,...,kq adding up to n and some A\,...,Aq E F. 


Then 


a) Each À; is an eigenvalue of A. 
b) For each eigenvalue i of A and each positive integer m, the number of Jordan 
blocks Jm(A) among Jy, (Ai)... +. Jka (Aa) is 
Nm(A) = rank(A — AI,)"*! — 2rank(A — AJ,)" + rank(A — AI,)"~! 


and depends only on the similarity class of A. 
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Proof. We have already seen part a). The proof of part b) is very similar to the 
solution of Problem 6.43. More precisely, let B = A — AJ, and observe that B” is 
(Jk, (Ai) — ALE)” 0 TE 0 


7 0 (Jka (Az) — AIk)” 0 
similar to . . f , thus 


0 0 ve (Jra (Aa) — Ara)” 


d 
rank(B”) = X rank(Jy, (Ai) — AIk)”. 
i=l 


Now, the rank of (Ja, (à) — In)” is 


e nifA Æ yw, as in this case 
Ji (A) a Uln =J + (A aa by 


is invertible, 
e n—m forà = uw andm <n, as follows from Problem 6.42. 
e OforA = wandm >n,as J} = On. 


Hence, if Nm (A) is the number of Jordan blocks Jm (A) among Jk, (A1),..-, Jka 
(Aq), then 


rank(B”) = 2 (ki —m) + 5 ki, 


isd dived 
ki>m 
then subtracting these relations for m — 1 and m yields 


rank(B™—!) — rank(B”) = X 1 


Ai=A 
kj>m 


and finally 
rank(B”~') — 2rank(B”) + rank(B”t!) = (rank(B™7!) — rank(B’”))— 
(rank(B”) — rank(B”"*!)) = ` 1 = Nn (A), 
isd 


ki=m 


as desired. oO 


Note that if an eigenvalue À has algebraic multiplicity 1, then there is a single 
Jordan block attached to À, and it has size 1. 
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Example 9.42. Consider the matrix 


1000 2 
0010 0 
A=| 0000 0 
0100 0 
—1 000-2 


We compute y4(X) by expanding det(XI; — A) with respect to the third row 
and obtain (using again an expansion with respect to the second row in the new 
determinant) 


eee oe 
TAC SI a nas 8 i =X%°| 0 X 0 
1 00X+2 MARGS 
X-1 -2 
= x3 = X*(x 1). 
1 ae AD 


The eigenvalue —1 has algebraic multiplicity 1, thus there is a single Jordan 
block associated with this eigenvalue, of size 1. Let us deal now with the eigenvalue 
0, which has algebraic multiplicity 4. Let N, be the number of Jordan blocks of size 
m associated with this eigenvalue. By the previous theorem 


N; = rank(A”) — 2rank(A) + 5, 
N = rank(A*) — 2rank(A”) + rank(A) 


and so on. One easily checks that A has rank 3. Next, one computes 


—1 000-2 1000 2 
0000 0 0000 0 
£=| 00000 |], A=] 00000 
00100 0000 0 
1000 2 —1 000-2 


Note that A? has rank 2 (it is apparent that a basis of the space spanned by its rows 
is given by the first and fourth row) and A? has rank 1. Thus 


N =2-2:3+5=1, 


thus there is one Jordan block of size 1 and 


Ny =1—-2:24+3=0, 
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thus there is no Jordan block of size 2. Since the sum of the sizes of the Jordan 
blocks associated with the eigenvalue 0 is 4, and since we already know that there 
is a block of size 1 and no block of size 2, we deduce that there is one block of size 
3 and so the Jordan canonical form of A is 


—10000 
0 0000 
00010 
0 0001 
0 0000 


9.3.1 Problems for Practice 


1. Given a real number œw and two real numbers a, b, find all twice differentiable 
functions f : R > R satisfying f(0) = a, f’(0) = b and 


f +f =0 


2. Find all smooth functions f : R > R such that f(0)=1, f’(0)=0, f”(0)=0 
and 


f +f +fi tf =0. 


3. Let V be a finite dimensional F-vector space and let T : V — V be a linear 
transformation such that T? = id. 


a) Prove that V = Ker(T — id) @ Ker(T? + T + id). 
b) Prove that 


rank(T — id) = dim Ker(T? + T + id). 
c) Deduce that 
V = Ker(T — id) @ Im(7 — id). 
4. Describe the sequences (X,,)n>0 of complex numbers such that 
Xn+4 + Xn43 — Xn+1 — Xn = 0 


forall n > 0. 
5. Find the Jordan canonical form of the matrix 
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1 0 -3 
A=]| 1 -1-6 
=Z; <5 


6. Compute the Jordan canonical form of the matrix 


1100 
0120 
0010 
0002 


A= 


7. Consider the matrix 


011000 
000210 
000012 
000000 
000000 
000000 


a) Prove that A? = O; and find the characteristic polynomial of A. 
b) Find the Jordan canonical form of A. 


8. What are the possible Jordan forms of a matrix whose characteristic polynomial 
is (X — 1)(X — 2)?? 

9. Consider a matrix AE M¢(C) of rank 4 whose minimal polynomial is X(X — 1) 
(X —2)?. 
a) What are the eigenvalues of A? 
b) Is A diagonalizable? 
c) What are the possible forms of the Jordan canonical form of A? 


10. Prove that any matrix similar to a matrix of the form 


Jr Ar) 0 ES 0 
0 Jka (Aa)... (0) 
0 0 aie) 


is trigonalizable (this is a converse to Jordan’s theorem). 
11. a) What is the minimal polynomial of J,,(A) when A € C andn > 1? 
b) Explain how we can compute the minimal polynomial of a matrix in terms 
of its Jordan canonical form. 
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12. 


13. 


14. 


15. 


16. 


17. 
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Prove that two matrices A, B € M,,(C) are similar if and only if P(A) and 

P(B) have the same rank for all polynomials P € C[X]. 

Use Jordan’s theorem to prove that any matrix A € M,,(C) is similar to its 

transpose. 

a) Prove that if A € M,,(C) is similar to 2A, then A is nilpotent. 

b) Use Jordan’s theorem to prove that if A € M,,(C) is nilpotent then A is 
similar to 2A. 

Let T : V > V bea trigonalizable linear transformation on a finite dimensional 

vector space V over a field F. Let 


d 


xr(T) = | [& - A)" 


i=l 

be the factorization of its characteristic polynomial and let 
C; = ker(T — A; - id)“ 

be the characteristic subspace of À;. 


a) Prove that ker(T — A; - id)* = C; for all k > k;. Hint: use Theorem 9.15 to 
show that V = ker(T — A; - id)‘ ® ®;4iC;, then take dimensions. 
b) Prove that 


dim Ci = ki. 


Hint: consider the matrix of T with respect to a basis of V obtained by 
patching a basis of C; and a basis of a complementary subspace of C;. What 
is its characteristic polynomial? 

c) Prove that the smallest positive integer k for which 


ker(T — A; - id) = C; 
is the multiplicity of A; as root of the minimal polynomial of T. 


(The Dunford—Jordan decomposition) a) Using Jordan’s theorem, prove that any 
trigonalizable linear transformation T : V — V on a finite dimensional vector 
space is the sum of a diagonalizable and of a nilpotent transformation, the two 
transformations commuting with each other. 


b) State the result obtained in a) in terms of matrices. 
b) Conversely, prove the sum of a nilpotent and of a diagonalizable transforma- 
tions which commute with each other is trigonalizable. 


(More on the Dunford—Jordan decomposition) Let T : V — V bea 


trigonalizable linear transformation with 


d 


xr(T) = [|] -2)" 


i=l 
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as in Problem 15. Let C; be the characteristic subspace of the eigenvalue Aj. 
We define the A;-spectral projection 7), as the projection of V onto C; along 
®;4iC;. Thus by definition if v € V is written as v; +... + va with v; € Ci, 
then 


qty; (Vv) = Ci. 


a) Use the proof of Theorem 9.15 to show that 
Ta; E FIT}. 


b) Let 


d 
D= So Ais ma. 


i=l 


Prove that D is a diagonalizable linear transformation on V, that N =T —D 
is nilpotent and N o D = D o N. Thus D,N give a Dunford-Jordan 
decomposition of T. 

c) Prove that D and N are in F[T]. 

d) Deduce from part c) that if D’ is diagonalizable, N’ is nilpotent, D’ and N’ 
commute and D’ + N’ = T, then D’ = D and N’ = N. In other words, 
the pair (D, N) in the Jordan—Dunford decomposition is unique. 

e) Find the Dunford—Jordan decomposition of the matrices 


-1 1 0 110 1 0 -3 
A=] 0-11 B=|011], C=] 1 -1-6 
0 0-1 001 -12 -5 


Chapter 10 
Forms 


Abstract This chapter has a strong geometrical flavor. It starts with a discussion 
of bilinear and quadratic forms and uses this to introduce Euclidean spaces and 
establish their main geometric properties. This is in turn applied to linear algebra, 
leading to a classification of symmetric and orthogonal matrices with real entries. 


Keywords Quadratic form e° Bilinear form * Polar form e° Euclidean space 
e Inner-product œ  Positive-definite matrix œ Orthogonal projection 
e Gram-Schmidt algorithm 


The goal of this last chapter is to make a rather detailed study of Euclidean spaces 
over the real numbers. Euclidean spaces make the link between linear algebra, 
geometry and analysis. They are therefore of fundamental importance. The geo- 
metric insight they offer also reveals unexpected and deep properties of symmetric 
and orthogonal matrices. Thus on the one hand proving the fundamental theorems 
concerning Euclidean spaces will use essentially everything we have developed so 
far, so this is also an opportunity to see real applications of linear algebra, on the 
other hand the geometry of Euclidean spaces helps discovering and proving many 
interesting properties of matrices! Among the important topics discussed in this 
chapter, we mention: basic properties of bilinear and quadratic forms, orthogonality 
and inequalities in Euclidean spaces, orthogonal projections and their applications 
to minimization problems, orthogonal bases and their applications, for instance 
to Fourier analysis, the classification of isometries (i.e., linear transformations 
preserving distance) of an Euclidean space, the classification of symmetric matrices, 
and its applications to matrix inequalities, norms, etc. In all this chapter we work 
with the field F = R of real numbers. Many exercises (left to the reader) are devoted 
to the analogous theory over the field of complex numbers. 
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378 10 Forms 
10.1 Bilinear and Quadratic Forms 


We have already introduced the notion of d-linear form on a vector space in the 
chapter devoted to determinants. We will be concerned with a special case of this 
notion, and for the reader’s convenience we will give the detailed definition in this 
special case: 


Definition 10.1. Let V be a vector space over R. A bilinear form on V is a map 
b: V x V — R such that 


e Forallx € V the map b(x,-): V — R sending v to b(x, v) is linear. 
¢ Forall y € V the map b(-, y) : V > R sending v to b(v, y) is linear. 


The bilinear form b is called symmetric if b(x, y) = b(y, x) forall x,y € V. 


Remark 10.2. If x1,...,Xn € V, Y1,..., Ym E V and ay,...,4n,C1,---,Cm E R, 
then for any bilinear form b on V we have 


bÈ aixi Y cy) =} J aiejb@i, yy) (10.1) 


i=1 j=l i=l j=l 


In particular, if V is finite dimensional and if ¢,,...,¢, is a basis of V, then b is 

uniquely determined by its values at the pairs (e;, ej) with 1 < i, j < n (i.e., if 

b, b’ are bilinear forms on V and b(e;,e;) = b’(e;,e;) for all 1 <i, j < n, then 

b = b’). 

Example 10.3. a) Ifaı,...,an are real numbers and V = R”, then setting for x = 
(X1, ..., Xn) and y = (y1, ..., Yn) 


b(x, y) = 41X1 yy +... + AnXnYn 
yields a symmetric bilinear form on V. The choice aj = ... = a, = lis 
particularly important and in this case we call b the canonical inner product 


on R”. 
b) Consider the space V of continuous, real-valued functions on [—1, 1]. Then 


1 
b(f.g) = i JOa 


is a symmetric bilinear form on V, as the reader can easily check. 
c) Let V be the space M, (R) of n x n matrices with real entries, and consider the 
map b : V x V > R defined by 


b(A, B) = Tr(AB). 
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Then b is a symmetric bilinear form on V. A slightly different and more 
commonly used variant is 


b'(A, B) = Tr(A'B). 


The reason why b’ is preferred to b is that one can easily check that if A = [a;;] 
and B = [b;;], then 


bI(A, B) = È aijbij, 


ij=1 


that is if we identify V = R" via the canonical basis of V, then b’ becomes 
identified with the canonical inner product on R”. 

d) Let V be the space of sequences (Xn)n>1 of real numbers for which ¥`„; x2 is 
a convergent series. Define 7 


b(x, y) = $ Xn 


n>1 


for x = (Xn)n>1 and y = (Yn)n>1 in V. Note that the series 7.) XnYn 
converges since it converges absolutely. Indeed, we have (|xn| — |yn|)? => 0 
which can be written as 


eee 


XnVn| S 
[Xn Yn] 5 


2 2 

and by assumption the series with general term aor 2 converges. One can easily 
check that b is a symmetric bilinear form on V. 

e) Let V be the space of polynomials with real coefficients and, for P,Q e V, 


define 


Pom) 


BOS Duca. 


n>1 


Note that the series converges absolutely, since n* /2” = O(1/n?) for all k > 1. 
Then b is a symmetric bilinear form. 


It follows easily by unwinding definitions that the set of all bilinear forms on V is 
naturally a vector subspace of the vector space of all maps V x V — R. Moreover, 
the subset of symmetric bilinear forms is a subspace of the space of all bilinear 
forms on V. To any bilinear form b one can attach a map of one variable 


q4:V >R, q(x) = D(x,x). 
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This is called the quadratic form attached to b. Let us formally define quadratic 
forms: 


Definition 10.4. A quadratic form on V is a map q : V — R for which there is a 
bilinear form b : V x V — R such that g(x) = b(x, x) forall x € V. 


A natural question is whether the bilinear form b attached to a quadratic form 
as in the previous definition is uniquely determined by the quadratic form. So the 
question is whether we can have two different bilinear forms b1, b2 such that 


by (x, x) = bo(x, x) 


for all x. Stated differently, is there a nonzero bilinear form b such that b(x, x) = 0 
for all x € V? The answer is yes: consider the bilinear form b : R? x R? > R 
defined by 


b((x1, y1), (X2, ¥2)) = X1y2 — x271. 


Clearly this is a nonzero bilinear form and b(x, x) = 0 for all x. On the other hand, 
if we further impose that b should be symmetric, then we have uniqueness, as 
shows the following fundamental: 


Theorem 10.5. For any quadratic form q : V — R there is a unique symmetric 
bilinear form b : V x V — R such that q(x) = b(x,x) for all x € V. It is 
determined by the polarization identity 


q(x + y)— q(x) - 10) 


b(x, y) = 5 


Proof. Fix a quadratic form q : V — R. By hypothesis we can find a bilinear (but 
not necessarily symmetric) form B such that g(x) = B(x, x) for all x € V. Define 
amapb:VxV —> Rby 


q(x + y)—-q4(x)-q4(y) 
5 


b(x, y) = 


We claim that b is a symmetric bilinear form and b(x, x) = q(x). By definition, 
we have 


B(x + y,x + y)— B(x, x) ZBO, y) 


b(x, y) = J 


Since B is bilinear, we can write 
B(x + y, x + y) = B(x, x) + B(x, y) + BO, x) + BY. y). 


Thus 


B(x, y) + B(y, x) 
3 f 


b(x, y) = 
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This makes it clear that b(x,x) = B(x, x) = q(x) and that b(x, y) = b(y, x) for 
all x, y € V. It remains to see that b is bilinear. But for fixed x the maps B(x, -) and 
B(-, x) are linear (since B is bilinear), thus so is the map 


= B(x,-) + BG, x) 


b(x,:) 5 


Similarly, b(-, x) is linear for all x € V, establishing that b is bilinear and proving 
the claim. 

Let us now show that b is unique. If b’ is another bilinear symmetric form such 
that b’(x, x) = q(x) for all x, then a computation as in the previous paragraph gives 


q(x +y) =b (x +y, x+y) = 
b'(x, x) + 2b'(x, y) + b'(y, y) = q(x) +4) + 2b'(x, y), 


thus necessarily b’(x, y) = b(x, y) for all x, y, that is b’ = b. o 


Definition 10.6. If b is attached to q as in the previous theorem, we call b the polar 
form of q. 


Example 10.7. a) Consider the space V = R” and the map q : R” — R defined by 
q(X,---,X) = x? + web Xe, 
Then q is a quadratic form and its polar form is 
D(X, - 6-5 Xn), Wis- Yn) = M1 +. + nn 
Indeed, let us compute for x = (x1, ..., Xn) and y = (y1,.--, Yn) 


a(x +y)-49)-40) _ Dai + yi)? - Via? -— Via 7? 
2 2 


= So xii. 


i=1 


The map (x, y) œ> }°7_, x; yı being bilinear and symmetric, it follows on the 
one hand that q is a quadratic form and on the other hand that b is its polar form. 

b) Consider the space V of continuous real-valued maps on [0, 1] and define q : 
V —> Rby 


1 
a(f) = J fdz. 
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To see that q is a quadratic form and to find the polar form of f, we compute 


a(ft+g)—4(f)-a(g) _ h COA dx — fy f(@Pdx— fy g(x)Pdx 
2 2 


1 
= f fender. 


Since the map b defined by b( f, g) = fs J (x)g(x)dx is bilinear and symmetric, 
it follows that q is a quadratic form with polar form b. 
c) As a counter-example, consider the map q : R? — R defined by 


q(x, y) = x? + 2y? + 3x. 


We claim that q is not a quadratic form. Indeed, otherwise letting b its polar form 
we would have 


b((x, y), (x, y)) = x? + 2y? + 3x 


for all x, y € R?. Replacing x by —x and y by —y and taking into account that 
b is bilinear, we obtain 


x? + 2y? + 3x = b((x, y), (x, y)) = b(-(, y), —(x, y)) = 
b((—x,—y), (=x, -y)) = x” + 2y? — 3x, 


thus 6x = 0 and this for all x € R, which is plainly absurd. 


The previous theorem establishes therefore a bijection between quadratic 
forms and symmetric bilinear forms: any symmetric bilinear form b determines 
a quadratic form x > b(x,x), and any quadratic form determines a symmetric 
bilinear form, namely its polar form. 


Problem 10.8. Let q be a quadratic form on V, with polar form b. 
a) Prove that for all x, y € V 


q(x + y)— ae —y) 


d(x, y) = 7 


b) (Parallelogram law) Prove that for all x, y € V 


q(x + y) + 4x- y) = 2(q(x) +40 )). 
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c) (Pythagorean theorem) Prove that for all x, y € V we have b(x, y) = 0 if and 
only if 
q(x + y) = q(x) +40). 
Solution. a) By the polarization identity we have 
q(x + y) = q(x) +40) + 2b(x, y) 
and (noting that g(—y) = q(y) and b(x, —y) = —b(x, y)) 
q(x — y) = q(x) + 40) — 2b(x, y). 


Subtracting the two previous relations yields the desired result. 
b) It suffices to add the two relations established in the proof of part a). 
c) This follows directly from the polarization identity. oO 


Let us try to understand the quadratic forms on R”. If q is a quadratic form on 
R” with polar form b, and if e),...,@, is the canonical basis of R”, then for all 
X= xe; +... + Xnen E€ R” we have, using Remark 10.2 


q4(Xi;---, Xn) = b(x1e1 +... + Xen, X161 + ... + Xen) = 


n n 
) b(ei, ej )XiX; = J Aij XiXj, 


ij=l ij=l 


with aj; = b(e;,e;). Notice that since b(e;,e;) = b(e;,e;), we have aij = dji, 
thus any quadratic form q on R” can be written 


n n 
q(Xi,-.., Xn) = ij XiXj = ii Xi +2 dij XiXj, 


ij=l i=l l<i<j<n 


with A = [a;;] a symmetric matrix. 
Conversely, if A = [a;;] is any matrix in M,,(R) (not necessarily symmetric), 
then the map 


n 

. n — 

q:R” SR, q(xi,..., Xn) = ) ij XiX; 
ij=l 


is a quadratic form on R”, with polar form 


aij (xix; + xjxi) 


EE A R D 5 


ij=l 
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We leave this as an easy exercise for the reader. Notice that 


q(x) = ij XiXj = > bij XiX;, 


ij=l ij=l 


with 


and the matrix B = [b;;] is symmetric. 
There is another natural way of constructing quadratic forms on R”: pick real 
numbers @,...,@, and linear forms /;,...,/, on R”, and set 


q(x) = aly (x)? +... + æl (x). 


Then q is a quadratic form on R”, with associated polar form given by 


b(x.y) = Y aih OLO), 


i=l 


as the reader can easily check. The following amazing result due to Gauss says that 
we obtain in this way all quadratic forms on R”. Moreover, Gauss described an 
algorithm which allows us to write a given quadratic form q in the form 


q =l? +... +l, 
with /;,...,/, linearly independent linear forms. This algorithm will be described 


in the (long) proof of the following theorem. 


Theorem 10.9 (Gauss). Let q be a quadratic form on V = R". There are real 
numbers 0t,,..., 0, and linearly independent linear forms l,,...,1, € V* such that 
forallx € V 


q(x) = alix)? +... + æl, (x). 


Before giving the proof of the theorem, let us make some further remarks on the 
statement. Of course, we may assume that œ; # 0 for 1 < i < r, otherwise simply 
delete the corresponding term a;/?. Let J be the set of those i for which œ; > 0 and 
let J be the set of those i for which a; < 0. Then 


q(x) = doa) — Doo) 


ie] ie] 
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and defining 
Li = Jol; if icl and L;=./-ajl; if ie J, 


we obtain 
= 2 2 
q=) Uj -) 
i€l ieJ 
Moreover, since /;,...,/, are linearly independent, so are Lj,...,L,. In other 


words, we can refine the previous theorem by asking that a; € {—1, 1} for all i. 
One can prove that the number of elements of 7, J, as well as the number r are 
uniquely determined by q (this is Sylvester’s inertia theorem). The pair (|Z|, |J) 
consisting in the number of elements of J and J is called the signature of q. We 
call |Z| + |J| = r the rank of q (we will see another interpretation of r later on, 
which will also explain its name). 

We will start now the algorithmic proof of Theorem 10.9, by induction on n. For 
n = | we can write q (x1) = XT, where x; € R and a, = g(1) € R, so the result 
holds. 

Assume now that the result holds for n — 1. We can write 


n 
2 
qQ(%1,.--,%n) = ` aux; +2 y Ajj XiXj 
i=l l<i<j<n 
for some scalars a;; € R. We will discuss two cases: 


¢ There isi € {1,2,...,”} such that a;; 4 0. Without loss of generality, we may 
assume that a,, 4 0. We consider q (x1, . . . , Xn) as a quadratic polynomial in the 
variable x, and complete the square, to obtain 


n—1 n—1 
2 2 
Usage) = ansit2(} ana) Xn + > aiix +2 y aij x? = 


i=l i=l l<i<j<n-1 


n—-1 a n—l a 
in in 9 
Ann ant) Xi | —ann > z“ +) aix; +2 ) aij XjXj 
nn 


i=l l<i<j<n-1 


2 


n=] > 
in j 
= Ann | Xn + ) Xi +q (x1, a <> Xn—1), 


where q’ is a quadratic form on R”~!. By induction, we can write 


, 
2 
Of (Xi, Xn=1) = X aiL, <- Xn—1) 


i=l 
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for some linearly independent linear forms L; in x),..., X,—1. Defining 


al 
Gin 
l4 (%1, wee Xn) = Xn + ) aa Ar+1 = Ann 
nn 


i=1 
and 
li,- Xn) = Li (Xi, ..-, Xn=1) 


for 1 <i < r we obtain 


r+l1 


q(x) = X ali (x)? 


i=l 


forall x € V and the inductive step is finished (we leave it to the reader to check 
that /;,...,/,41 are linearly independent). 

e Alla;; = 0. If all a;; = 0, then q = O and the result is clear. If not, without loss 
of generality we may assume that a,—1,, 4 0. We use the identity 


b c c b be 
axy + bx + cy =a (xy + -x + -y =a(x+í) y+-)- = 
a a a a 


to rewrite 


n—2 
q(x, tee Xn) = 2An—1,nXn—1X0 F 2 ) GinXiXn 


i=l 


n—2 
+2 X ai n—1XiXn—1 + 2 X dij XiXj = 
i=l l<i<j<n—2 


n—2 n—2 
di Ai n- 
2an-1.0n (s + 5 ii «) $ (e + >. ss x) F q' (xı, tee , Xn-2) 


i=l Qn-1,n = Qn-1,n 


for some quadratic form q’ on R’~’. Applying the inductive hypothesis, we can 
write 


A 
q' (xı, vee Xn—2) = X aLi, e. ee) 
i=l 


for some linearly independent linear forms L; of x),...,Xn—2, as well as some 
scalars a;. Using the identity 
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_ @+by-(a-by 


ab : 
4 
we obtain 
n=2 a =z 7 
in in-i 
2an-1.0n Xn—1 + 5 — xi |: [X + > — xi] = 
= An—1,n eee An-1,n 
i=1 i=l 
An—-1,n 2 2 
7 (ligii) — lp42(X1,..., Xn) ), 
where 
n—2 
Gin +F Qin-1 
l4 (x1, tee Xn) = Xn—1 + Xn + > i 
oe An-1,n 
i=l 
and 
n—2 a a 
in “i n—l 
L42(%1, tee Xn) = Xn-1 — Xn + >. HX 
= Qn-1,n 
i=l 
All in all, setting 
An—1,n 
&r+1 = r42 = 7 = 
we have 
r+2 
2 
q(x) = Yo oih. 
i=l 
We leave it to the reader to check that /4, . . . , l-42 are linearly independent. This 


finishes the proof of Theorem 10.9. 


Problem 10.10. Implement the algorithm described in the previous proof in each 
of the following cases: 


a) q is the quadratic form on R? defined by 


q(x, y, z) = xy + yz + zx. 


b) q is the quadratic form on R? defined by 


q(x. y, z) = x =y} + (vy —2) + @— x). 
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Solution. a) With the notations of the previous proof, we have a;; = Ofor1 < i < 
3, thus we are in the second case of the above proof. We focus on the nonzero 
term yz and we write 


qay D= (V+ x)GtX)— 2’. 
Next, using the identity 


_ @t+byP-(@-by 
= 4 


ab 


we obtain 


Qxtyty-O-y 


(yv+tx)z+x)= Fi 


We conclude that 
1 2_1 2 2 
qyz) = z tyt -70 —-x 
and one easily checks that the linear forms 2x + y + z, y — z and x are linearly 
independent. 
b) Itis tempting to say that q is already written in the desired form, but the problem 


is that the linear forms x — y, y — z, and z — x are not linearly independent (they 
add up to 0). Therefore we write (by brutal expansion) 


qx, y) =(x-yP +0- +e- = 
2(x? + y? +2? —xy — yz = zx). 


We are in the first case of the previous proof, so we focus on the term x? and try 
to complete the square: 


2 2 
TZ FZ, 
1y) =2(x- 23 ) -o4 ) + 2y? + 27? —2yz= 
2 2 2 2 

y +z 3y + 32° — 6yz yrz 3 z 

2(x- =2(x- “(vy — 
(: 5 ) + 5 x 5 + ay z) 

and we easily check that the linear forms x — vee and y — z are linearly 


independent. o 
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1. Prove that the map 
b:R? xR? SR, D(x, y), (z, t)) = xt — yz 


is a bilinear form on R?. Describe the associated quadratic form. 
2. Consider the map q : R4 > R, 


q(x, y,z,t) = xy $22 + tx- t. 


a) Prove that g is a quadratic form and find its polar form. 

b) Implement Gauss’ algorithm and write q in the form }*;_, a;/? with real 
numbers œ; and linearly independent linear forms /;. 

c) What is the signature of q? 


3. Use Gauss’ algorithm to write each of the following quadratic forms as 
Yal? with linearly independent linear forms /li,...,l, and scalars 
Q1,...,Q;. 


a) q(x, y, z) = (x -2y +2? — (x -y +2. 

b) (x,y,z) = (x — 2y +2)? + (y — 224+ x) — -2x + yy. 
c) q(x, y,z,t) = xy + yz + zt + tx. 

d) q(x, y, z) = x? + xy + yz + 2x. 


For each of these quadratic forms, find its signature and its rank. 
4. a) If q is a quadratic form on R”, is it true that {x € R” |q (x) = 0} is a vector 
subspace of R”? 
b) Describe geometrically {x € R"|q(x) = 0} if g(x,y) = x? — 2y?, 
if g(x, y) = x? + y? and finally if g(x, y,z) = x? + y? — z. 
5. Which of the following maps are quadratic forms: 


a) q : R? > R, q(x, y, z2) = x? + y? +2. 
b) q : Rf > R, q(x, y,z, t) = xt -2 + zt — y. 
c) q : R > R, q(x, y,z, t) = (x +2(y + 4)? 
6. Let V be the space of continuous real-valued maps on [—1, 1] and consider the 
map b : V x V > R defined by 


1 
b(f.g) = f =P) fed + fe. 


a) Prove that b is a symmetric bilinear form on V. 
b) If q is the associated quadratic form, find those f € V for which g(f) = 0. 
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11. 


12. 


13. 
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Let b be a bilinear form on a vector space V over R. The kernel of b is the set 
ker b defined by 


kerb = {x € V|b(x, y) =0 Vy eV}. 
a) Prove that ker b is a vector subspace of V. 


b) Find the kernel of the polar form of the quadratic form g(x, y,z) = xy + 
yz + zx on R°. 


. If b is a bilinear form on a vector space V over R, is it true that {(x, y) € 


V x V|b(x, y) = 0} is a vector subspace of V x V? 


. Let V = M, (R) and consider the map q : V — V defined by 


q(A) = Tr( AA) + (Tr(A))?. 


Prove that q is a quadratic form on V and describe its polar form. 

One can define bilinear forms over C, but they do not have all the properties 
one desires. Instead it is standard to take sesquilinear forms (sesqui- meaning 
one-and-a-half). 

Definition. Let V be a vector space over C. A sesquilinear form on V is a map 


g:V x V — C such that 


i) For all x € V the map g(x,-) : V > C sending y to g(x, y) is linear. 
ii) For all y € V the map ọ(-, y) : V — C sending x to the complex conjugate 
(x, y) of v(x, y) € C is linear. 


The sesquilinear form ø is called conjugate symmetric or hermitian if 
g(x,y) = (y, x) forall x,y € V. 

In the next problems V is a C-vector space. 
Prove that the set S(V) of sesquilinear forms on V is a vector subspace of the 
C-vector space of all maps y : V x V > C. 
Prove that the set H(V) of hermitian sesquilinear forms on V is a vector 
subspace of the R-vector space S(V). Is H(V) a C-vector subspace of S(V)? 
Prove that we have a direct-sum decomposition of R-vector spaces 


SV) = HV) @iH(V). 


Let y be a hermitian sesquilinear form on V and consider the map ® : V > C 
defined by 


P(x) = v(x, x). 


A map ® : V = C of this form is called a hermitian quadratic form and if 
P(x) = (x, x) for all x € V, we call the hermitian sesquilinear form ø the 
polar form of ®. 
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14. 


15. 


16. 


a) Prove that (x) € R forall x € V. 
b) Prove that (ax) = |a|?P(x) for alla € C and x € V. 
c) Prove that for all x, y € V we have 


P(x + y) = P(x) + POY) + 2Rel(x, y)). 
d) Deduce the polarization identity 
P(x + y)— P — y) +i (P + iy) — P(x —iy)) = 4p, x). 


Conclude that the polar form of a quadratic hermitian form is unique. 
e) Prove the parallelogram law 


P(x + y) + P(x — y) = (8x) + P )). 
Let V = C” and consider the map ® : V — R defined by 
DXi, Xn) = [xl + xl? +... + [xnl? 
for all (x1,..., Xn) € C”. Prove that ® is a hermitian quadratic form and find 
its polar form. 


Let V be the space of continuous maps f : [0,1] —> C. Answer the same 
questions as in the previous problem for the map ® : V — R defined by 


1 
of) = i. IfoyPat. 


Prove the complex analogue of Gauss’ theorem: if ® is a hermitian quadratic 
form on C”, then we can find a;,...,a@, E {—1, 1} and linearly independent 
linear forms /;,...,/, on C” such that for all x € C” 


P(x1,...,Xn) = X aili. 


i=l 


10.2 Positivity, Inner Products, and the Cauchy-Schwarz 


A 


Inequality 


fundamental notion in the theory of bilinear and quadratic forms is that of 


positivity: 


Definition 10.11. a) A symmetric bilinear form b : V x V — R is called positive 


if b(x, x) > 0 for all x € V. We say that b is positive definite if b (x, x) > 0 for 
all nonzero vectors x € V. 
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b) A quadratic form q on V is called positive (or positive definite) if its polar form 
is positive (or positive definite). Thus q is positive if g(x) > 0 for all x € V, and 
positive definite if moreover we have equality only for the zero vector. 


Problem 10.12. Which of the following quadratic forms are positive? Which ones 
are positive definite? 


a) q(x, y,2 = xy + yz + zx. 
b) q(x, y, z) = x? + Wy = 2} + 3- x}. 
c) q(x, y, z) = xX? + y? +z? — Xy — yz — zx. 


Solution. a) We have to check whether xy + yz + zx > 0 for all real numbers 
x, y, z. This is definitely not the case, since taking z = 0 we would have xy > 0 
for all x, y € R, which is definitely absurd. Thus q is not positive, and thus not 
positive definite either. 

b) It is clear that g(x, y,z) > 0 for all x,y,z € R, since g(x, y,z) is a sum of 
squares of real numbers. Thus q is positive. To see whether q is positive definite, 
we need to investigate when q(x, y,z) = 0. This forces 


x=y-z7=z7-x=0 


and then x = y = z = 0. Thus q is positive definite. 
c) We observe that 


ed es Jad se T 


q(x, y,z) = 


for all x, y,z € R, thus q is positive. Notice that q is not positive definite, since 
q(1,1,1) = 0, but (1,1,1) # (0,0,0). o 


We introduce now another fundamental concept, which will be constantly used 
in the sequel: 


Definition 10.13. a) An inner product on a R-vector space V is a symmetric 
positive definite bilinear form on V. 

b) An Euclidean space is a finite dimensional R-vector space V endowed with an 
inner product. 


We warn the reader that some authors do not impose that an Euclidean space 
is finite dimensional. When dealing with inner products and Euclidean spaces, the 
notation (x, y) is preferred to b(x, y) (where b is the inner product on V). If ( , } is 
an inner product on V, we let 


[ll] = v(x, x) 


and we call ||x|| the norm of x (the reason for this name will be given a bit later). 
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Remark 10.14. a) If V is an Euclidean space, then any subspace W of V is 
naturally an Euclidean space, when endowed with the restriction of the inner 
product on V to W: note that this restriction is still an inner product on W, by 
definition. 

b) R” endowed with the canonical inner product 


(X1, Xn) O15 -5 Vn)) = X1 y1 + X2V2 + -.. + XnYn 


is an Euclidean space. We leave it to the reader to check this assertion. 


Problem 10.15. Let n be a positive integer and let V be the space of polynomials 
with real coefficients whose degree does not exceed n. Prove that 


(P,0) =J PHO) 


i=0 
defines an inner product on V. 


Solution. First, it is clear that for any P the map Q + (P,Q) is linear, and 
similarly for any Q the map P +> (P, Q) is linear. Next, we have 


(P,P) => PG? 
i=0 


and the last quantity is clearly nonnegative. Finally, assume that (P, P} = 0 for 
some P € V. Then )>;_, P(i)? = 0, which forces P(i) = 0 forall 0 <i < n. 
Thus P has at least n + 1 distinct roots and since deg P < n, we deduce that P = 0. 
The result follows. Oo 


Problem 10.16. Let V be the space of continuous real-valued maps on [a, b] 
(where a < b are fixed real numbers). Prove that the map (, ) defined by 


b 
e= f foyeax 


is an inner product on V. 


Solution. It is easy to see that (, ) is a symmetric bilinear form, it remains to check 
that it is positive definite. Since f? is a continuous nonnegative map, we have 


b 
A= f fod zo 
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Suppose that (f, f) = 0 and that f is nonzero. Thus there is x9 € (a, b) such that 
(xo) 4 0. By continuity we can find ¢ > 0 such that (xo — £, xo + £) C [a, b] and 
| f(x)| => e for x € (xo — £, xo + £). It follows that 


xo+e 
GAS] edx = 28 > 0 


Xo E 
and the result follows. oO 


Problem 10.17. Let V be the space of smooth functions f : [0,1] —> R such that 
f) = fC) = 0. Prove that 


1 
nee J Sg") + S" (dx 


defines an inner product on V. 


Solution. Using integration by parts, we obtain 


1 1 
D= -Ug + D+ 2 i} fie’ @dx =2 J f'g dx. 


The last formula makes it clear that (, ) is a symmetric bilinear form on V. 
It remains to see that it is positive definite. We have 


1 
if Gp f Cf (x)dx = 0, 


with equality if and only if (by the previous problem) f’(x) = 0 for all x. This 
last condition is equivalent to saying that f is constant. But since f vanishes by 
assumption at 0, if f if constant then it must be the zero map. Thus (f, f) = 0 
implies f = 0, which yields the desired result. o 


The fundamental result concerning positive symmetric bilinear forms is the 


Theorem 10.18 (Cauchy-Schwarz Inequality). Let b : V x V —> R bea 
symmetric bilinear form and let q be its associated quadratic form. 


a) Ifb is positive, then for all x, y € V we have 


b(x, y? < qq). 


b) If moreover b is positive definite and if b(x, y} = q(x)q(y) for some x, y € V, 
then x and y are linearly dependent. 


Proof. a) Consider the map F : R — R given by 


F(t) = q(x + ty). 
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Note that since b is bilinear and symmetric, we have 
F(t) = b(x + ty,x + ty) = b(x, x) + b(x, ty) + b(ty, x) + b(ty,ty) 
= q(x) + th(x, y) + th(x, y) + PbO, y) = q(x) + 2tb(x, y) + 4). 


Thus F(t) is a quadratic polynomial function with leading coefficient g(y) > 0. 
Moreover, since b is positive, we have F(t) > 0 for all t € R. It follows that the 
discriminant of F is nonpositive, that is 


4b(x, y}? — 4q(x)q(y) < 0. 


But this is precisely the desired inequality (after division by 4). 

Suppose that b is positive definite and that b(x, y)? = q(x)q (y). We may assume 
that y Æ 0, so that g(y) > 0. Thus with the notations used in the proof of part a), 
the discriminant of F is 0. It follows that F has a unique real root, say t. Then 
q(x+ty) = 0 and since q is positive definite, this can only happen if x+ty = 0. 
Thus x and y are linearly dependent. o 


b 


wm 


The following result is a direct consequence of the previous theorem, but it is of 
fundamental importance: 


Corollary 10.19. If V is a vector space over R endowed with an inner product 
(, ), then for all x,y € V we have 


I(x. y) < [lll yll. 


Example 10.20. a) Let V = R” be endowed with the canonical inner product. 
The inequality | (x, y)| < ||x|| - ||y|| can be re-written (after squaring) as 


ayi te. tna)? < OP +... + 22007 +... + y?). 


b) Let V be the space of continuous real-valued maps on [a,b], where a < b are 
real numbers. The map ( , ) : V x V — R defined by 


b 
(fe) = J Tods 


is an inner product on V (see Problem 10.16) and the inequality in the corollary 
becomes (after squaring) 


b 2 b b 
( f fastas) <( i fařax)-( / ests), 
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Let V be a vector space over R, endowed with an inner product ( ,). By the 
previous corollary we have 


ae NAW Zips Ge -i0 
llulillvl| 


Thus there exists a unique angle 6 € [0, x] satisfying 


(u, v) 


cos 9 = ——. 
llullllvil 


We define this angle 0 to be the angle between the vectors u, v. 
An important consequence of the Cauchy—Schwarz inequality is 


Theorem 10.21 (Minkowski’s Inequality). Let V be a vector space over R and 
let q be a positive quadratic form on V. Then for all x, y € E we have 


Va(x) + Vay) = Vax + y). 


Proof. Squaring the inequality we obtain the equivalent one 
q(x) +40) + 2Va)a(y) = a(x + y). 
Letting b be the polar form of q, the polarization identity yields 
q(x + y) = q(x) + g(y) + 2b(x, y). 


Comparing this equality and the previous inequality, we obtain the equivalent form 


vaq) = d(x, y), 


which, squared, is exactly the Cauchy—Schwarz inequality. oO 
Consider now an inner product ( , ) on some R-vector space V. Recall that we 
defined 
IV >R, [x] = v(x, x). 


Since an inner product is positive definite, we see that ||x|| > 0 for all x, with 
equality if and only if x = 0. Also, since an inner product is a bilinear form, we 
have ||ax|| = |a|||x|| for all a € R. Finally, Minkowski’s inequality yields 


lx + yll < [ll + Ty 


for all x, y € V. We call this inequality the triangle inequality. 
A map ||- ||: V — R satisfying the following properties: 


e ||v|| > 0 for all v € V, with equality if and only if v = 0. 
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e |lav|| = |a|-||v|| forall v € V anda ER. 
© |lv+ wll < [lvl] + ||w]] for all v, w € V 


is called a norm on V. This explains why we called ||x|| the norm of x. 

Minkowski’s inequality shows that any inner product on a vector space V over 
R naturally endows the space V with a norm || - ||. We can use this norm to define a 
distance d : V x V > Rt by 


d(u, v) = ||u— vl]. 
One can check (see the exercise section) that for all u, v,w € V we have 
d(u,v) + d(v,w) > d(u,w). 
This construction is of fundamental importance, since it allows us to do analysis 
on V as we do it on R. Note that if V = R” with n < 3, endowed with its 
canonical inner product, then the distance obtained as above is really the Euclidean 


distance that we are used with on the line, in the plane and in three-dimensional 
space. For instance, the distance between the points (1, 1) and (2, 3) is 


d((1, 1), (2,3)) = VU —2) + (1-3)? = V5, 


and this really corresponds to the geometric distance between these two points in 
the plane. 


10.2.1 Practice Problems 


In the following problems, whenever the inner product on R” is not specified, it is 
implicitly assumed that we consider the canonical inner product on R”, defined by 


(X1, Xn), Vis. -3 Yn) = X11 + X22 fee + XnYn- 


1. Let V be an R-vector space endowed with an inner product ( , ). Recall that 
the distance between two points x, y € V is 


d(x,y)= (x—y,x—y). 
Prove the triangle inequality 
d(x, y) + d(y,z) = d(x,z) 


for all x, y,z € V. 
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a) What is the distance between the vectors u = (1,1, 1) and v = (1,2,3) in 
R?? 

b) What are their respective norms? 

c) What is the angle between u and v? 


. Find the angle between the vectors u = (1,2) and v = (1, —1) in R°. 
. Find the vectors v of norm 1 in R? such that the angle between v and (1, 1, 0) 


oi 
IS F- 


. Among all vectors of the form (1,x,2,1) with x € R, which vector is at 


smallest distance from (0, 1, 1, 1)? 


. Find all values of œ € R for which the map ( , ) : R* x R — R defined by 


((x1, X2, X3, X4), (Y1, Y2, Y3, Y3)) = AX y1 + 2x2y2 + (1 — a) x33 + X44 


is an inner product. 


. Prove that if f : [a,b] — R is a continuous map, then 


b 2 b 
(/ roar) <@-a f fdt. 


. a) Prove that if x1, .. ., Xn are positive real numbers, then 


1 1 1 
(rt ate bay) (Soa. bo) an 
X1 X2 Xn 


When do we have equality? 
b) Prove that if f : [a,b] —> (0,00) is a continuous map, then 


b b 1 3 
[ roa f 700 ele 


When do we have equality? 


. Let f : [0,1] > R* be a continuous map taking nonnegative values and let 


1 
x= f t” f(t)dt. 
0 
Prove that for all n, p > 0 


Xn+p = VX2n * /X2p- 


Let V be a C-vector space and let ® be a hermitian quadratic form on V. 
Assume that @ is positive definite, i.e., (x) > 0 for all nonzero vectors x € V. 
Let ¢ be the polar form of @. 
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a) Prove the Cauchy—Schwarz inequality: for all x, y € V we have 


lo(x, y)? < B(x) B(y), 


with equality if and only if x, y are linearly dependent. 
b) Prove Minkowski’s inequality: for all x, y € V we have 


V P(x) + J P(y) = VE + y). 


11. Prove that there is no inner product (, ) on the space V of continuous real- 
valued maps on [0, 1] such that for all f € V we have 


(ff) = sup f(x)’. 


x€[0,1] 


Hint: the parallelogram law. 


10.3 Bilinear Forms and Matrices 


From now on we will focus only on finite dimensional vector spaces V over R. 
We have already seen that we can describe linear transformations on V in terms of 
matrices. We would like to have a similar description for bilinear forms. 


Definition 10.22. Consider a basis e1,...,€„ of V, and let b be a symmetric 
bilinear form on V. The matrix of b with respect to the basis ¢),...,¢, is the 
matrix (b(e;,€;))1<i,j<n- 

b) If q is a quadratic form on V, the matrix of q with respect to the basis 
€1,...,@, is the matrix of its polar form with respect to e1, ..., €n. 


Theorem 10.23. Let V be a finite dimensional vector space and let e,,...,€n 
be a basis of V. Sending a symmetric bilinear form to its matrix with respect to 
€i, ... , €n establishes an isomorphism of R-vector spaces between the vector space 
of symmetric bilinear forms on V and the vector space of symmetric matrices in 


M, (R). 


Proof. Itis clear that if A is the matrix of b and A’ is the matrix of b’, then cA + A’ 
is the matrix of cb + b’ for all scalars c € R. Also, since b is symmetric, we have 
b(e;,e;) = b(e;,e;), thus the matrix of b is symmetric. Thus sending a symmetric 
bilinear form b to its matrix A with respect to e,,...,é@, induces a linear map Q 
from symmetric bilinear forms on V to symmetric matrices A € M,,(R). 

Injectivity of the map ¢ follows directly from Remark 10.2, so it remains to prove 
that ¢ is surjective. Start with any symmetric matrix A = [a;;]. If x = x1e1 +... + 
X,e, and y = yjey +... + Ynen are vectors in V, define 


b(x,y) = aij Xi Vj- 


ij=l 
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It is easy to see that b is a symmetric bilinear form whose matrix in the basis 
€1,..-,€n is precisely A. oO 


A natural question is the following: what is explicitly the inverse of the 
isomorphism given by the previous theorem? Fortunately, this has already been 
answered during the proof: it is the map sending a symmetric matrix A = [a;;] 
to the bilinear form b defined by 


n 
b(xiei +... + Xnen, Vier +... + Yuen) = 5 ij Xi yj. 
ij=l 


This formula does not come out of nowhere, but it is imposed by Remark 10.2. 
Also, note that the right-hand side of the previous equality can be written as XAY, 
where X, Y are the column vectors whose coordinates are x;,...,X,, respectively 
Y1, ---, Yn- Here we consider ‘X as a 1 x n matrix and of Y as an x 1 matrix, 
so that 'XAY is a 1 x 1 matrix, that is a real number. We obtain the following 
characterization of the matrix of b with respect to the basis e),..., €n. 


Theorem 10.24. Let e,,e2,...,@, be a basis of V and let b be a symmetric bilinear 
form on V. The matrix of b with respect to e,,..., e, is the unique symmetric matrix 
A € M, (R) such that 


b(x, y) = 'XAY 


for all vectors x, y € V (where X,Y are the column vectors whose coordinates are 
those of x, y with respect to €,,..., en). 


Remark 10.25. Keep the hypotheses and notations of the previous theorem and 
discussion. If q is the quadratic form attached to b, then 


n n 
q(x1e] SE ae ae Xne€n) = t XAX — X ij XiXj = X aux + 2, 5 AijXiXj, 
i j=l i=l l<i<j<n 
the last equality being a consequence of the equality a;; = aj; fori < j. The 


presence of the factor 2 is quite often a source of errors when dealing with the link 
between quadratic forms and matrices. Indeed, it is quite tempting (and this happens 


quite often!) to say that the quadratic form associated with the matrix A = | A d is 
q(x1, X2) = X1X2, 


which is wrong: the quadratic form associated with A is 


q(x1, X2) = 2x1 x2. 
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An even more common mistake is to say that the matrix associated with the 
quadratic form 


q(x, y,Z) = xy +yz+ 2x 


O11 oil 
23 

on R° is | 101 |. Actually, the correct matrix is | + 0 $ |, since the polar form 
110 550 


of q is the bilinear form b defined by 


xy + yx +x z+zx +y z+zZy 


bæ, y,2), œ, 9,2) = 2 


Armed with Theorem 10.24, it is not difficult to understand how the matrix of 
a bilinear form varies when we vary the basis. More precisely, consider two bases 
€},...,@, and el, Tiss el of V and let A, A’ be the matrices of a symmetric bilinear 
form b with respect to these bases. If x = xje) +... + Xnen = xel +... + xel is 
a vector in V, let X (respectively X’) be the column vector whose coordinates are 


X1,...,X, (respectively xi, ...,X)). Then 


b(x, y) = 'XAY = 'X'A'Y’. 


Letting P be the change of basis matrix from e),...,e, to el, ..., €l, (recall that 
the columns of P are the coordinates of e|,...,e/, when expressed in terms of 
el, ..., €n), we have 


X = PX', Y= PY. 
It follows that 
t X'A'Y' = b(x, y) = 'XAY = '(PX)APY' = '(X')' PAPY' 


and we obtain the following 


Theorem 10.26. Suppose that a symmetric bilinear form b has matrix A with 


respect to a basis e),...,e, of V. Let e},...,e, be another basis of V and let 
P be the change of basis matrix from e1, ... „en to e}, ..., e”. Then the matrix of b 
with respect to e, ..., e! is 

A’ = ' PAP. 


The previous theorem suggests the following 


Definition 10.27. Two symmetric matrices A, B € M,,(R) are called congruent 
if they are the matrices of some symmetric bilinear form in two bases of F”. 
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Equivalently, A and B are congruent if there is an invertible matrix P € M,(F) 
such that B = ‘PAP. 


By definition, the congruence relation is an equivalence relation on the set of 
symmetric matrices A € M,,(R), that is: 


e any matrix A is congruent to itself (this is clear). 

e If A is congruent to B, then B is congruent to A: indeed, if B = ‘PAP, then 
A = (PTH BP. 

e If Ais congruent to B and B is congruent to C , then A is congruent to C. Indeed, 
if B = ' PAP and C = 'QBQ, then C = '(PQ)A(P Q). 


Note that two congruent matrices have the same rank. This allows us to define 
the rank of a symmetric bilinear form as the rank of its matrix in any basis of 
the surrounding space (the previous discussion shows that it is independent of the 
choice of the basis). Note that we cannot define the determinant of a symmetric 
bilinear form in a similar way: if A and B are congruent matrices, then it is not true 
that det A = det B. All we can say is that if B = ‘PAP, then 


det B = det(‘ P) det A det P = det A - (det P)’, 


thus det A and det B differ by the square of a nonzero real number. In particular, 
they have the same sign. The discriminant of a symmetric bilinear form is defined 
to be the sign of the determinant of its matrix in a basis of the surrounding space 
(it is independent of the choice of the basis). 

The fundamental theorem concerning the congruence relation is the following 
consequence of Theorem 10.9: 


Theorem 10.28 (Gauss). Any symmetric matrix A E€ M,(R) is congruent to a 
diagonal matrix. 


Proof. Consider the associated quadratic form g on V = R" 


GA) = 'XAX, ie, q(xı,..., Xn) = > Oyj XjX;. 


ij=l 


By Theorem 10.26, it suffices to prove the existence of a basis of R” with respect to 
which the matrix of q is diagonal (as then A will be congruent to the corresponding 
diagonal matrix). 

By Theorem 10.9 we know that we can find real numbers @,..., œ, and linearly 
independent linear forms /;,...,/, € V* such that 


q(X) = Do ah (X? 
i=1 
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for all X € V. Complete the family (4j,...,/,) to a basis (,...,/,) of V*. 
By Theorem 6.13 there is a basis (€),...,@,) of V whose dual basis is (/),...,/,). 
Thus, if X = xei +... + Xnen E V, then 


q(X) = Sah (x)? = Su 
i=l i=l 


so that the matrix of q with respect to the basis (e1, ..., en) is the diagonal matrix 
D with diagonal entries a1,...,@;. oO 


Remark 10.29. The proof also shows the following interesting fact: if q is a 
quadratic form on R” with polar form b, then we can find a basis fi,..., fn of 
R” such that 


bf, fi) =0 foral 1<i#j<n. 


Such a basis is called a g-orthogonal basis of R”. We can even impose that q ( fi) € 
{—1,0, 1} for all 1 < i < n. Indeed, as in the above proof we can write 


q(X) = J aili (X) 
i=l 


and by the discussion following Theorem 10.9 we can even ensure that œ; € {—1, 1} 
forall 1 <i <r.Ife,,...,e, is a basis as in the above proof, then 


q(X) = Lan 


i=1 


for X = xie +... + Xnen, thus 


D(X, Y) = sc 


i=l 
with a; = O forr <i < n. It follows that we can take f; = e; for] <i <n. 
We introduce one more definition before ending this section: 


Definition 10.30. A symmetric matrix A € M,,(R) is called positive if 'XAX > 0 
for all X € R”. It is called positive definite if ‘XAX > 0 for all nonzero vectors 
xX ER’. 


In other words, A = [a;;] is positive (respectively positive definite) if and only 
if the quadratic form associated with A, namely (x),...,X,) > Le jj XiXj, 18 
positive (respectively positive definite). Any symmetric positive definite matrix A 
gives rise to an inner product ( , )4 on R”, defined by 


(X,Y)4 = (X, AY) = 'XAY, 


where ( , ) is the canonical inner product on R”. 


404 10 Forms 


Note that if A is positive then letting e,,...,e, be the canonical basis of R”, 
we have 


aii = 'e; Ae; > 0, 


and if A is positive definite then the inequality is strict. Also, note that any matrix 
congruent to a positive (respectively positive definite) symmetric matrix is itself 
positive (respectively positive definite), since 


'X('PAP)X = '(PX)A(PX). 


Problem 10.31. Let A € M,(R) be any matrix. 


a) Prove that ‘AA is symmetric and positive. 
b) Prove that ‘AA is positive definite if and only if A is invertible. 


Solution. Note that 
*('AA) = 'A-*('A) = ‘AA, 
thus ‘AA is symmetric. Next, for all X € R” we have 
'X(‘AA)X = '(AX)(AX) = ||AX|Ê > 0, 

with equality if and only if AX = 0. Both a) and b) follow from these observations 
(and the fact that A is invertible if and only if AX = 0 implies X = 0). oO 
Remark 10.32. The same result holds with A’ A instead of ' AA. 

Remarkably, the converse of the result established in the previous problem holds: 


Theorem 10.33. Any symmetric positive matrix A € M,,(R) can be written as ' BB 
for some matrix B € M,(R). 


Proof. We use Gauss’ Theorem 10.28. By that theorem, there is an invertible 
matrix P such that ‘PAP = D is a diagonal matrix. By the discussion preceding 
Problem 10.31 we know that D itself is positive and its diagonal coefficients dj; 
are nonnegative. Hence we can write D = 'D, Dy, for a diagonal matrix Dı whose 
diagonal entries are /d;;. But then 


A='P"'pDP!='P7!'D,D,P7! = ‘BB, 


with B = Dı P7!. o 


Problem 10.34. Let (V,(, )) be an Euclidean space and let vı, ..., vn be a family 
of vectors in V. Let A € M, (R) be the Gram matrix of the family, i.e., the matrix 
whose (i, j )-entry is (v;, vj}. 


a) Prove that A is symmetric positive. 
b) Prove that A is positive definite if and only if vı, . . . , vn are linearly independent. 
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Solution. Again, it is clear that A is symmetric. For all x|,..., Xn € R we have 
n n 
ae = ae) = 
ij=l ij=l 


ey Vi, XjV;) = Per Dv) HS anl z0, 


i=l j=1 j=l i=l 


with equality if and only if }`;—; x;v; = 0. The result follows. o 


Problem 10.35. Letn > 1 and let A = [a;j] € M,(R) be defined by a;i; = 
min(i, 7). Prove that A is symmetric and positive definite. 


Solution. It is clear that the matrix is symmetric. Note that we can write 


min, j)= J) 1. 


kSi,k<j 
Doing so and interchanging orders of summation, we see that 
n n n n n 

SEmi = LI wey =H (Hx) - 

i=1 j=1 k=1i=k j=k k=1 
This last expression is clearly nonnegative, and it equals 0 if and only if 

Xp tess + Xp = 0, x2 +--+ + Xn = 0, sae Xn =0. 

Subtracting each equation from the one before it, we see that the unique solution is 


X1 = X2 = ++ =x, = 0, which shows that the matrix is positive definite. 
An alternative argument is to note that 


mint, j= j! filo) fy de, 


where fj(x) = 1 for x e [0,7] and f(x) = 0 for x > i (ie, f; is the 
characteristic function of the interval [0,7]). It follows that A is the Gram matrix 
of the family fi,..., fn, thus it is symmetric and positive. Since fi,..., fn are 
linearly independent (in the space of integrable functions on [0, 0o)), it follows that 
A is positive definite (all this uses Problem 10.34). 

Yet another argument is to note that A = ‘BB where B is the upper triangular 
matrix all of whose nonzero coefficients are equal to 1. Since B is invertible, we 
deduce that A is positive definite by Problem 10.31. oO 
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10.3.1 Problems for Practice 


1. Consider the map q : R? — R defined by 
Wx, y2) = (x + 2y +37 + O+ -O-z 


a) Prove that q is a quadratic form and find its polar form b. 

b) Is q positive definite? 

c) Give the matrix of q with respect to the canonical basis of R°. 
d) Consider the vectors 


vi = (2,0,0), v = (5,1,1), v = (1,1, —1). 


Prove that (v1, v2, v3) is a basis of R? and find the matrix of b with respect 
to this basis. 


2. Consider the map q : R? — R defined by 
q(x, y, z) = x(x = y +z)= 2y +2). 


a) Prove that q is a quadratic form and find its polar form b. 
b) Find the matrix of q with respect to the canonical basis of R°. 
c) Find those vectors v € R? such that b(v, w) = 0 for all vectors w € R°. 


3. Consider the quadratic form q on R? defined by 
q(x, y, zZ) = 2x(x +y =z) +y 42. 


a) Find the matrix of q with respect to the canonical basis of R°. 

b) Write q in the form )*7_, a;/? with /,,...,/, linearly independent linear 
forms. 

c) Find the rank, signature, and discriminant of q. 

d) Find a qg-orthogonal basis of R? and give the matrix of q with respect to this 
basis. 


4. Is the matrix A = [aj;] € M,(R) with aj; = i - j positive? Is it positive 
definite? 

5. a) Prove that a symmetric positive definite matrix is invertible. 
b) Prove that a symmetric positive matrix is positive definite if and only if it is 

invertible. 

6. All entries but the diagonal ones of the matrix A € M,,(R) are equal to —1, 
while all diagonal entries are equal to n— 1. Is A positive? Is it positive definite? 

7. Prove that any symmetric positive matrix A € M,,(R) is the Gram matrix of a 
family of vectors vı, ..., vn € R”. Hint: use Theorem 10.33. 

8. Let V be a R-vector space endowed with an inner product (, ) and let x1,..., Xn 
be vectors in V. The Gram determinant of x),...,x,, denoted G(x|,..., Xn) 
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10. 


11. 


12. 


13. 


is by definition the determinant of the Gram matrix [(x;, x; )]1<i,; <n. Prove that 
X1,...,Xy is linearly independent if and only if G(x1,..., Xn) Æ 0. 


. Compute the Gram determinant of the vectors 


xı = (1,2,1), x2 =(-1,-1,2), x3 =(1,0,-1) 


in R°. Are they linearly independent? 

Prove that the matrix A = lagli. j<n is symmetric and positive (hint: what 
is f titi-ldt), 

Let A = [aij] E€ M,(R) be a matrix such that aj; = 1 fori A j, andajj > 1 
for alli € [1,7]. Prove that A is symmetric and positive definite. 

Let n be a positive integer. Consider the space V of polynomials of degree at 
most n with real coefficients. Define a map 


1 
b:VxV >R, D(P,Q)= / tP(t)O'(t)dt, 
0 


where Q’ is the derivative of Q. 


a) Prove that b is a bilinear form on V. Is it symmetric? 
b) Let q be the quadratic form attached to b, so that q(x) = b(x, x). Find the 
matrix of q with respect to the basis 1, X,..., X” of V. 


In this long problem we establish the link between sesquilinear maps and 
matrices, extending thus the results of this section to finite dimensional 
C-vector spaces. Let V be a finite dimensional C-vector space and let B = 
(€1,...,@n) be a basis of V. Recall that S(V) is the set of sesquilinear forms 
on V. 


a) Let g € S(V) be a sesquilinear form on V. The matrix of y with respect to 
B is the matrix A = [a;;] € M,(C) where a;; = g(e;,e;) for 1 <i, j <n. 
Prove that for all vectors x, y € V we have 


(x,y) = X* AY, 


where X, Y are the column vectors whose coordinates are the coordinates of 
x, y with respect to B, and X* = ‘YX is the row vector whose coordinates 
are the complex conjugates of the coordinates of x with respect to B. 

b) Prove that A is the unique matrix having the property stated in a). 

c) Prove that the map sending g € S(V) to its matrix with respect to B is an 
isomorphism of C-vector spaces between S(V) and M,,(C). 

d) Let g € S(V) and let A be its matrix with respect to B. Prove that @ 
is hermitian if and only if A satisfies A = ‘A. Such a matrix is called 
conjugate symmetric or hermitian. We usually write A* instead of “A, 
so a matrix A is hermitian if and only if A = A*. 
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e) Let B’ be another basis of V and let P be the change of basis matrix from 
B to B’. If x,y € V, let X,Y (respectively X’, Y’) be the column vectors 
whose coordinates are the coordinates of x, y with respect to B (respectively 
B’). Prove that if A (respectively A’) is the matrix of g with respect to B 
(respectively B’), then 


A’ = P* AP. 


f) A hermitian matrix A € M,(C) is called positive (respectively positive 
definite) if X* AX > 0 forall X € C” (respectively if moreover X* AX F 0 
for X # 0). Prove that for any matrix B € M,,(C) the matrices B* B and 
BB* are hermitian positive, and they are hermitian positive definite if and 
only if B is invertible. 

g) Prove that any hermitian positive matrix A can be written as BB* for some 
matrix B € M,(C). 
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Let b be a symmetric bilinear form on a vector space V over R (for now we don’t 
assume that V is finite dimensional). For each y € V the map x — D(x, y) is by 
definition linear, thus it is a linear form on V. Letting V* be the dual of V, we obtain 
therefore a map 


g:V—>V*, y(x) = b(x, y). 


Since for all x € V the map y —> b(x, y) is linear, it follows that g, is a linear map. 


Problem 10.36. Suppose that V is finite dimensional and let e;,...,e, be a basis 
of V. Let ef,...,e% be the dual basis! of e),...,@,. Prove that the matrix of p, 
with respect to the bases e,,...,e, and e*,...,e* is the matrix of b in the basis 


Cj,---,€y- 


Solution. For x = xie} +... + Xnen € V we have 
pb (ei) (x) = B(x, ei) = x1b(e1, €i) +... + Xnb(en, &) 
= b(e1, ee; (x) +... + b(en, eiJe} (x). 
Thus 


pplei) = b(e1,ei)e¥ +... + (en, ever. 


‘Recall that ež (ej) = l;=j, where 1;—; equals 1 ifi = j and 0 otherwise. 
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The result follows. o 
The following result, though very simple, is very useful: 


Theorem 10.37 (Riesz’s Representation Theorem). Jf V is an Euclidean (thus 
finite dimensional) space with inner product (, }, then the map Q; : V > V* is 
an isomorphism. In other words, for any linear map | : V — R there is a unique 
vector v € V such that I(x) = (v, x} forall x € V. 


Proof. Since dim V = dim V*, it suffices to prove that gy) is injective. Assume 
that g )(x) = 0 for some x € V. Then by definition (x, y) = 0 for all y € V, in 
particular (x, x} = 0. But then ||x||? = 0, where || - || is the norm associated with 
the inner product, and so x = 0. oO 


Let V be again an arbitrary vector space over R and let b be a symmetric bilinear 
form on V. 


Definition 10.38. a) Two vectors x, y € V are called orthogonal (with respect to 
b) if b(x, y) = 0. 

b) The orthogonal SŁ of a subset S of V is the set of vectors v € V which are 
orthogonal to each element of S. 

c) Two subsets S, T of V are called orthogonal if S C T+, that is any element of 
S is orthogonal to any element of T. 


Remark 10.39. Suppose that ( , ) is an inner product on V, with associated norm 

|| - || (thus ||x|| = y (x, x)). Then vectors x, y € V are orthogonal if and only if 
Ilx + yl? = Ixl? + Iy. 

This is the Pythagorean theorem and follows directly from the polarization identity 


lx + yl? = Ixl? + 2(x, y) + Iyl. 


Coming back to the general case, note that b(x, y) = 0 is equivalent to 
ga (y)(x) = 0, i.e., x and the linear form (y) are orthogonal in the sense of 
duality. This allows us to use the results we have already established in the chapter 
concerned with duality to obtain information about symmetric bilinear forms. As a 
consequence, we obtain the following fundamental: 


Theorem 10.40. Let V be an Euclidean space and let W be a subspace of V. Then 
WŁ W =V, in particular 
dim W+ + dim W = dim V. 


Moreover, (W+)+ = W. 


We can slightly refine the following theorem by allowing infinite dimensional 
ambient spaces and finite dimensional subspaces therein. 
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Theorem 10.41. Let V be any R-vector space endowed with an inner product and 
let W be a finite dimensional subspace of V. Then 


W@Wt=V and WH =W. 


Proof. Let {, ) be the inner product on V, with associated norm || - ||. We start by 
proving that W @W+ = V. If x € W N W+, then (x, x) = 0, that is ||x||? = 0, 
and so x = 0. Thus W N W+ = {0}. We still need to prove that W + W+ = V, so 
let x € V be arbitrary and consider the map f : W — R defined by f(y) = (x,y). 
Then f is a linear form on W. Since W is an Euclidean space (for the inner product 
inherited from the one on V), by Theorem 10.37 there is a unique z € W such that 
fO) = (z, y) for all y € W. We deduce that (x — z, y) = 0 for all y € W, thus 
x —ze WŁ. Since z € W, we conclude that x € W + W~ and the result follows. 

It remains to prove that W++ = W. By definition W is contained in W+4, 
so let x € WŁ, By the result established in the previous paragraph we can write 
x = y+zwith y € W andz € WŁ. Since x e WŁ, we have (x,z) = 0, thus 
(y,z) + ||z||? = 0. But y € W and z € W+, thus (y, z) = 0 and so ||z||? = 0, then 
z = 0 and finally x = ye W. o 


Remark 10.42. The hypothesis that W is finite dimensional is crucial in the 
previous theorem. Indeed, consider the following situation: V is the space of 
continuous real-valued maps on [0, 1] and W is the subspace consisting of maps 
f such that f (0) = 0. Endow V with the inner product given by 


1 
(fe) = J E 


Then the orthogonal W+ of W is reduced to {0}, thus we do not have W@W+ = V 
or WŁ = W. To prove that W+ = {0}, let f € W+. Note that the map g defined 
by g(x) = xf (x) belongs to W and so ( f, g} = 0. This can be written as 


f xf (xy dx = 0. 
0 


Since the map x > x f(x) is continuous, nonnegative, and with average 0, it is the 
0 map and so f(x) = 0 for all x € (0, 1]. By continuity, we deduce that f = 0. 


The previous theorem has quite a lot of important applications in analysis, in 
particular concerning minimization problems. We will see a few examples in the 
sequel, but before that we introduce two very important definitions. 


Definition 10.43. Let V be a R-vector space endowed with an inner product. Let 
W be a finite dimensional subspace of V. By Theorem 10.41 we have V = W ® 
WŁ. The orthogonal projection onto W is the projection pw : V > W onto W 
along W+. In other words, for x € V py(x) is the unique vector in W such that 
x — pw(x)<€ Wt. 
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Remark 10.44. Simply by definition we have 


Pw(x) + pwa (x) = x 


for all x € V and all subspaces W of V. This can be very useful in practice, since it 
might be easier to compute the orthogonal projection onto W+ than that onto W. 


Example 10.45. Endow R? with the canonical inner product. Let W = {(0, 0, a3) | 
a3 € R}. Then the orthogonal complement W + is 


wt = {(a1, a2, 0) | 41,2 E€ R}. 


Note that W is the Cartesian z-axis and W+ is the Cartesian xy-plane. The orthog- 
onal projection Pw of R° onto W is the map 


Pw: R? >R’, Pw(x,y,z) = (0,0, z). 
Problem 10.46. Let 
vı =(1,-1,0,0) and v = (1,0,—1,0). 


Find the matrix of the orthogonal projection of R* onto W = Span(vı, v2). 


Solution. Let v € Rt and write pw(v) = av, + bv for some real numbers a, b. 
Since v — pw (v) is orthogonal to W, we have 


(v= (avı + bv2), v1) = (v — (avı + bv2), v2) = 0, 
which can also be written, taking into account that 
Ivl? =2, b2? =2, (vi, v2) =1, 
as 
2a +b = (v, vı), a+2b = (v, v). 


Solving the system yields 


Since 


2v =v 1 21 2v =v 1 1 2 9 
g NB. Bt ak y? 3 (373? 3° /y’ 
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we can easily compute the values pw(e1),..., Dw(e4), where e),...,e4 is the 
canonical basis of R*. More precisely, we obtain for v = vı that a = i and b = 1 


3? 
thus 
vty 2 1 1 
Cee 2 =( ala o), 


3 3? g 3z’ 


and similar arguments yield 


and finally 
pw (e4) = 0- vı +0- v2 = (0,0, 0,0). 


We conclude that the desired matrix is 


l om 


Da 
II 
wi=w = 


Definition 10.47. Let V be an Euclidean space. A linear map p : V > V is called 
an orthogonal projection if there is a subspace W of V such that p is the orthogonal 
projection onto W. 


The next theorem describes orthogonal projections as solutions to minimization 
problems. The result is absolutely fundamental: 


Theorem 10.48. Let V be a R-vector space with an inner product (, ) and with 
associated norm || - ||. Let W be a finite dimensional subspace of V and let v € V. 
Then pw (v) is the element of W at smallest distance from v, i.e. 


Ilv — pw(v)|| = min ||x — v|]. 
xEWw 


Moreover, pw (v) is the unique element of W with this property. 


Proof. Let x € W and apply the Pythagorean theorem (Remark 10.39; observe that 
x — pw(v) € W and v — pw (v) € W+) to obtain 


Ilx — vl? = [1 — pw) + (Pw) — VIP = 


Ix — pw IP + lpw) —vIP = lpw) =v. 
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This shows that ||x — v|| > || pw(v) — v|| for al v € V and yields the first part of the 
theorem. For the second part, note that equality holds in the first offset equation if 
and only if x = pw (v). This finishes the proof of the theorem. o 


Definition 10.49. With the notations of Theorem 10.48, we define the distance 
from v to W as 


d(v, W) = ||v— pw(v)|| = min ||x — vll. 
xEW 


Problem 10.50. Let V be a R-vector space endowed with an inner product (, ) and 
let W be a finite dimensional subspace of V. Let x1,...,x, be a basis of W and let 
v € V. Prove that 


G(v, X1,..., Xn) 
dv, WP = — 2 
( ) G(X1,..., Xn) 
where G(x1,...,Xn) = det((x;,xj)) is the Gram determinant of the family 
Mig see 5 A T 
Solution. Write pw(v) = a,x; +... + anXn for some real numbers aj,..., dp. 
By definition 


dv, Wy’ = ||v— pw OIP = v- pw), y) = III? — v, pw), 
thus 
d(v, W)? + ay(v, x1) +... + an (V, Xn) = ||v||?. 
Since 
(v, xi) = (v— pw(v), xi) + (pw), xi) = (Pw), xi) 
= ay (x1,X;) + az(x2, xi) +... + an (Xn, xi), 


we deduce that d (v, Ww)? and a),...,@, are solutions of the linear system in the 
unknowns f0,..., tn 


.+ tn (V, Xn) = IIv||? 
i tn (X1, Xn) = (v, x1) 


to + ti(v, x1) 


ink 
t(x1,X1) + b(x1,%2) +.. 


ty (X1, Xn) + to (x2, Xn) Pacet ta (Xas Xn) = (v, Xn) 


The result follows then straight from Cramer’s rule. oO 
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Problem 10.51. Consider the vectors 


= 
= a =e w 


Find the distance from b to the space W = Span(v1, v2). 


Solution. We start by finding the orthogonal projection of b on W by writing 
b = pw(b)+u 

with (u, vı) = (u, v2} = 0. Writing pw (b) = av; + Pvz we obtain 

allvill? + B(vi, v2) = (b, v1) 
and 

æ (vi, v2) + £llv2l? = (b, v2), 
which reduces to 

12x = 6, 4B = 6. 

We deduce that 

pw(b) = in + iy = 3 
and so 


d(b, W) = ||b— pw b)|| = 


VB-32?+0-(-))?+ 6- 1) + (1 — (—1))? = v24. 


Of course, one can also use the previous exercise, by computing (exercise left to 
the reader) G (v1, v2) = 48 and G(b, vı, v2) = 32 - 36, then 


_ [G(b, vi, v) _ 
d(b, W) = coa S v24. 
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However, we strongly advise the reader to redo every time the argument explained 
in the first part of the solution. Oo 


Problem 10.52. Let n be a positive integer and let V be the vector space of 
polynomials with real coefficients whose degree does not exceed n. For P,Q € V 
define 


(P,Q) = f P(x) O(x)en*dx. 


a) Explain why ( , ) is a well-defined inner product on V. 
b) Find 


(0,6) 
min / Cae? enka )e da 
0 


d1,....€n ER 


Solution. a) The definition makes sense, since xe~* is integrable on [0, 00) for 
all k > 0. More precisely, we have the following classical result, which follows 
easily by induction on k combined with integration by parts 


f exkdx = k!. 
0 


It is easy to see that (, ) is indeed an inner product: it is clearly symmetric 
and bilinear, and we have 


(P, P) = L P(x} e™*dx > 0. 


If the last quantity equals 0, then so does h e™P(x}dx < ie e-* P(x)*dx. 
Since x > e~* P(x)? is continuous, nonnegative, and with average value 0 on 
[0, 1], it must be the zero map, thus P vanishes on [0, 1] and so P = 0 (because 
P is a polynomial). This proves the claim that (, ) is an inner product. 

b) Let W be the span of X, X*,..., X”, then W is a subspace of V and the problem 
asks us to find 


inf ||1 + P|? =d(-1,W)’. 
PEW 


We know that the minimum value is attained when P is the orthogonal projection 
of —1 on W. This is characterized by (P + 1,Q) = 0 for all Q € W, or 
equivalently (P + 1, X*) = 0 for all 1 < k < n. Using the identity 


f etx*dx =k! 
0 
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and writing P = aX +...+a,X", we can rewrite the condition as 
k!+ oa(k+i!=0 o 1+ )la(k+1)...(k +n) =0. 
i=1 i=l 


Thus the polynomial Q = 14+)7"_, a;(X +1)...(X +i) vanishes at 1,2,...,n 
and since it has degree n and leading coefficient a,, we must have 


Q =a,(X -1)...(X —n). 
We need to evaluate 
a(-1,W)? = ||1 + P|? = (1 + P,1)=1+ Sai = Q(0) = (-1)"n!ay. 
i=l 
Taking X = —1 in the equality 
icy aus 1)...(X% +n) =a,(X —1)...(X —n) 
i=l 
yields 
1=a,(-1)"(n+ 1)! 
We conclude that the answer of the problem is 


1 1 
—1)"n!la, =n!- = ; o 
Ly tl ra ge 


10.4.1 Problems for Practice 


Whenever it is not specified, the inner product on R” is the canonical one. 


1. Let 
1 6 4 
x=] 3], x. = 14 and b=|-2 
—2 2 —3 


Find the distance from b to the plane spanned by x, and x. 
2. Determine the orthogonal projection of b = (2,1, 1) on the subspace spanned 
by (1, 1, 1) and (1,0, 1). 
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3. 


4. 


10. 


11. 


Let W be the subspace of Rf spanned by w = (1,—1,1,—1). Find the 
orthogonal projection of b = (3,0, 3, —2) on the orthogonal complement W +. 
Consider the vector space V of continuous real-valued maps on [0, 1], with the 
inner product defined by 


1 
(fe) = J Toeni 


Determine which of the functions f(x) = x and g(x) = x? is closer (with 


respect to the distance associated with the norm induced by the inner product) 


to the function h(x) = x?. 


. a) Let V be an Euclidean space and let 7;, T> be orthogonal projections such 


that T; oT is a projection. Prove that Tio T> = ToT. Hint: use Problem 14. 
b) Does this result remain true if we no longer assume that 7; and T, are 
orthogonal projections, but only projections? 


. Let a,,...,a, be real numbers, not all of them equal to 0. Let H be the set of 


vectors (X;,...,X,) € R” such that a,x; +... + a,x, = 0. Find the matrix of 
the orthogonal projection onto H with respect to the canonical basis of R”. 


. Let V be the vector space of polynomials with real coefficients whose degree 


does not exceed n. If P = )>'_, a;X' € V and Q = S*7_,b;X' € V, define 
(P, Q) = X aib. 
i=0 


Let H be the subspace of polynomials in V vanishing at 1. Compute d(X, H). 


. Let V be the set of polynomials with real coefficients and degree not exceeding 


3. Find 


T 
min f | P(x) — sinx|?. 
PEV J, 


. Find the vector in Span((1, 2, 1), (—1, 3, —4)) which is closest (with respect to 


the Euclidean norm) to the vector (—1, 1, 1). 
Let vı = (0,1, 1,0), vz = (1,—1, 1, —1) in R. Let W be the span of vı, v2. 


a) Find the matrix of the orthogonal projection of R* onto W with respect to 
the canonical basis of R*. 
b) Compute the distance from (1,1, 1,1) to W. 


Let V be the space of smooth real-valued maps on [0, 1], endowed with 


i 1 
(fg) = J fear + J fi@e dx. 
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a) Prove that (, ) is an inner product on V. 

b) Let W; be the subspace of V consisting of maps f vanishing at O and 1. 
Let W, be the subspace of V consisting of maps f such that f” = f. Prove 
that W; $ Wz = V and that W, and W, are orthogonal to each other. 

c) Describe the orthogonal projection of V onto W. 


12. Let (V,(, }) be an Euclidean space and let f : V —> V be a map such that 
(f(x), y) = (x, f(y)) for all x, y € V. Prove that f is linear. 

13. Let V be an Euclidean space and let T : V + V be a linear map such that T? = 
T, i.e., T is a projection. Prove that the following statements are equivalent: 


a) T is an orthogonal projection. 
b) For all x, y € V we have (T(x), y) = (x, T(y)). 

14. Let V be an Euclidean space and let T be a linear transformation on V such 
that T? = T, i.e., T is a projection. 


a) Suppose that T is an orthogonal projection. Using the Pythagorean theorem, 
prove that for all x € V we have 


ITO < Ilx]. 


b) Conversely, suppose that ||7'(x)|| < ||x|| for all x € V. Prove that (x, y) = 
0 for x € ker T and y € Im(T) (hint: use that ||T (x + cy)||? < ||x + cy||? 
for all real numbers c) and deduce that T is an orthogonal projection. 


10.5 Orthogonal Bases 


Let V be a vector space over R endowed with an inner product (x, y) > (x, y), 
with associated norm || - || (recall that ||x|| = y (x, x) for all x € V). 


Definition 10.53. a) A family (v;);ez of vectors in V is called orthogonal if 
(viv; =0 forall ijel. 


It is called orthonormal if moreover ||v;|| = 1 for alli € 7. Thus the vectors in 
an orthonormal family of V have norm 1 and are pairwise orthogonal. 

b) An orthogonal basis of V is a basis of V which is an orthogonal family. 

c) An orthonormal basis of V is a basis which is an orthonormal family. 


Note that the canonical basis of R” is an orthonormal basis of R” with respect 
to the canonical inner product on R”. In the following two exercises the reader will 
find two other very important examples of orthonormal families. 


Problem 10.54. Let xo,...,, be pairwise distinct real numbers and consider the 
space V of polynomials with real coefficients and degree not exceeding n, endowed 
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with 


(P,Q) =Y PEDO): 


i=0 


Prove that ( , ) is an inner product on V and that the family (L;)o<i<n where 


is an orthonormal family in V. 


Solution. Clearly ( , ) is a symmetric bilinear form on V and for all P € V we 
have 


(P, P) =} Pœ» = 0. 
i=0 


with equality if and only if P(x;) = 0 for O <i <n. Since xo, ... , Xn are pairwise 
distinct and since deg P < n, it follows that necessarily P = 0 and so (, ) is an 
inner product on V. 

To prove the second assertion, let i 4 j € {0,...,} and let us compute 


(Li, Ly) = D> Li Gi) Ly (=x). 
k=0 
Now, by construction we have 


Li (xj) = ij, 
where 6;; = 0 ifi A j and 1 otherwise. Thus 
(Li, Lj) = So bind jx. 
k=0 
Ifi A j, then 6;,5;, = Ofor0 < k < n, thus (L;,L;) = 0.Ifi = j, then 


bic jk = 0 fork Ai and 1 fork = i, thus (L;, L;i) = 1 and the result follows. O 


Problem 10.55. Let V be the space of continuous 27-periodic maps f : R > R, 
endowed with the inner product 


(fg) = G Toe 


420 


Let cn, Sn E V be the maps defined by 
Cn(x) = cos(nx), S(x) = sin(nx). 


Prove that the family 


P-ga Ut ane |b een 


is an orthonormal family in V. 


mats 


Solution. To simplify notations a little bit, let Co = and forn > 1 


V27 
1 1 
C= a Sn = an 
Clearly 
Gol? = f Sax =. 
-a 20 
Next, 


m] 1/71 2 
IICul? = f — cos?(nx)dx = / Fen ay = 1, 
at ce ES 2 


since 


/ cos(px)dx =0 Y peZ* 


T 
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(10.2) 


(a primitive of cos(px) is + sin(px) and this vanishes at x and —z). Similarly, we 


obtain that ||S„|| = 1, by using the identity 


1 — cos(2nx) 


- 2 — 
sin(nx) = 5 


Thus ||v|| = 1 for all v € F. 


It remains to check that elements of F are pairwise orthogonal. That Co is 
orthogonal to C, and S,, for all n > 1 follows from relation (10.2) and its analogue 


[sino =o. Y peZ 


= 


(10.3) 
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Next, we check that C,, and C,, are orthogonal for m Æ n. This follows from the 
identity 


cos((m — n)x) + cos((m + n)x) 
2 


cos(nx) cos(mx) = 


and relation (10.2). Similarly, the fact that S$, and S,, are orthogonal for n 4 m is a 
consequence of the relations 


cos((m — n)x) — cos((m + n)x) 
2 


sin(nx) sin(mx) = 
and (10.2). Finally, the fact that S,, and Cm are orthogonal for n,m > 1 follows 


from relations 


sin((m + n)x) + sin((n — m)x) 
2 


sin(nx) cos(mx) = 


and (10.3). o 


A fundamental property of orthonormal families is that they are linearly 
independent. More precisely 


Proposition 10.56. Let V be a vector space over R endowed with an inner product. 
Then any orthogonal family (vi )ic; of nonzero vectors in E is linearly independent. 


Proof. Suppose that X ;ez aivi = 0 for some scalars a; € R, such that all but 
finitely many of them are 0. For j € J we have 


(vj, X aivi) =0. 


icl 


By bilinearity, the left-hand side equals 


X a; (vj, vi) = a,\lv,\l’, 


ie] 


the last equality being a consequence of the fact that (v;);ez is orthogonal. We 
deduce (thanks to the hypothesis that v; Æ 0 for all j) thata; = O forall j € J 
and the result follows. oO 


The following result is a direct consequence of the previous proposition: 


Corollary 10.57. An orthogonal family of nonzero vectors in an Euclidean space 
of dimension n has at most n elements. Moreover, it has n elements if and only 
if it is an orthogonal basis. In particular, an orthonormal family of n vectors in 
an n-dimensional Euclidean space is automatically an orthonormal basis of that 
space. 
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When we have an orthonormal basis e;,...,@, of an Euclidean space V, it is 
rather easy to write down the coordinates of a vector v € V with respect to this 
basis: these coordinates are simply (v,e;) for 1 < i < n. More precisely, we have 
the very important formula 


v= > (v ei)ei (10.4) 


i=l 


called the Fourier decomposition of v with respect to the orthonormal basis 
e1, ... , €n. In order to prove formula (10.4), write 


n 
v= J Xiei 


i=1 
for some real numbers x; and observe that 


(v,e;) = > (xiei ej) = X xileiej) = Xj 
i=l i=l 
forall < j <n. 
Let us come back for a moment to the setup and notations of Problem 10.54. 
Recall that we proved in that problem that the polynomials 


X — Xk 

L(xX)= |] 
O<k<n Xi — Xk 

k#i 


for 0 < i < n form an orthonormal family in the space V of polynomials with real 
coefficients and degree at most n, endowed with the inner product defined by 


(P,Q) = D> P(x) OGi). 


i=0 


Since dim V = n+1 and since the family (L; )o<i<n is orthonormal (Problem 10.54) 
and has n + 1 elements, Corollary 10.57 shows that this family is an orthonormal 
basis of V. Moreover, for each P € V the Fourier decomposition of P becomes 


P= Ve Li)Li. 
i=0 
Note that 


(P, Li) = J Pr) Lis) = PG), 


k=0 
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since L;(x,) = 0 fori # k and 1 fori = k. We obtain in this way Lagrange’s 
interpolation formula: for all polynomials P of degree at most n we have 


P=)*PQA)Li =~ Pœ) [I —. 
i=0 i=0 i 


0<k<n 


k#i 


Let us do now the same thing starting with Problem 10.55. Let 7, be the space 
of trigonometric polynomials of degree at most n. By definition, 


Tn = Span(co, C1,- ., Cn, S1,- - Sn), 
where we recall that 
ck(x) = cos(kx), s(x) = sin(kx). 
Thus an element of 7,, is a continuous 27 -periodic map of the form 
x |> ao + S cos(kx) + bx sin(kx)) 
k=1 
with az, by € R. By Problem 10.55 the family 


F =|} Ufa sk sn UY} attsksn 


is orthonormal with respect to the inner product 


(e= f fd 


on Jn. This family is therefore linearly independent in 7, and by definition it spans 
Tn, hence it is an orthonormal basis of 7,. If f : R — R is any continuous 
2m-periodic map, we call the sum 


Si(f) = Yo (fag 


gEFn 


the nth partial Fourier series of f. A small calculation shows that we can also 
write 


ao( f) 


“mN =; 


+ J Gf) cos(kx) + be(f) sin(kx)) , 
k=1 
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where 
1 [7 1 f7 
an(f) = z l f(x)cos(mx)dx, bm(f) = S f F(x) sin(mx)dx 


and the mth Fourier coefficients of f. 
We can also rewrite the previous results in terms of the complex Fourier 
coefficients 


1 ™ ; 
nP) = z f foed 


of f. These are usually referred to simply as the Fourier coefficients of f. A nice 
exercise for the reader consists in checking that the partial Fourier series can also be 
expressed as 


n 


S(A@= >) ae, 


k=—n 
by first checking that for m > 0 


amS) = ibn (Sf) 


Cm(f) = 7 


Note that relation (10.4) says that 


f= nS) if f€ Tn, 


but of course we do not have f = S„( f) for any continuous 2x -periodic function 
f. One may wonder what is the actual relationship between f and the partial 
Fourier series of f. The naive guess would be that 


lim SAE) = £0) 


for every continuous 27-periodic map f : R > R. This is not true, but there are 
many situations in which this is actually true: a deep theorem in Fourier analysis 
due to Dirichlet says that if f and its derivative are piecewise continuous, then for 
all x we have 


ft) + f(x) 


Jim, Si(P)@) = 2 


where f (x+) and f(x—) are the one-sided limits of f at x. Thus if moreover f is 
continuous, we do have 


fim hA) = f), 
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which we can write as 


S=) ae = wo) 4 2 (ar (f) cos(kx) + br( f) sin(kx)) . 


kez 


Orthogonal bases are extremely useful in practice, as we can easily compute 
orthogonal projections and distances once we have at our disposal an orthogonal 
basis of the space we are interested in. More precisely, we have the following very 
useful 


Theorem 10.58. Let V be a vector space over R endowed with an inner product 
(,} and let W be a finite dimensional subspace of V. Let vi,...,vn be an 
orthogonal basis of W. Then for all v € V we have 


n 


Pw(v) = i Me OA 


1 Ivl 


Proof. Let us write v = pw(v) + u, with u € W+, that is (u,v;) = 0 for all 
i € [1,n]. Letting pw (v) = &œıvı +... + @,v, and using the fact that v1, ..., Vn is 
an orthogonal family, we obtain 


0 = (u, vi) = (v, vi) — (pw 0), vi) 
= (v, vi) Da (vj vi) = (v, (v, vi} — a; ||vi||?. 


It follows that 


~ 


EA (v, vi 
= 
[Ive ll? 


and the theorem is proved. o 
We can say quite a bit more. The inequality in the theorem below is called 


Bessel’s inequality. 


Theorem 10.59. Let V be a vector space over R endowed with an inner product 
(,), and let W be a finite dimensional subspace of V. If vi,...,Vn is an 
orthonormal basis of W, then for all v € V we have 


n 


pw(v) = J (vvi) vi 


i= 


426 10 Forms 


and 


n n 


d(v, WP = |v- $ (v, vi) vil? = IP — 00, vi). 


i=l i=l 
In particular we have 


n 


Xmv < Ill. 


i=l 
Proof. The formula for pw (v) is a direct consequence of the previous theorem. 
Next, using the Pythagorean theorem 


Ivl? = Iiv = pw WIP + Ilw. 


On the other hand, since v1, ..., v, 1s an orthonormal basis, we have 


lpw = I$, vi) vl? = 


i=l 


n n 


3 ((v vi)vi, (v, vj} vj) = > (v, vi) (vj) Wi vj) = 


i j=l i j=l 


n 


5 ĝi, j (v, vi) . (v, vj) = Xo, vy. 


ij=1 i=l 
Combining these equalities yields 


n n 


d(v, WP = |v- $ vvi) vil? = IP -J ovy. 


i=1 i=l 
Finally, since d(v,W)? > 0, the last inequality is a direct consequence of the 
previous formula. o 


Remark 10.60. Let V be a vector space over R endowed with an inner product and 
let (v;);ez be an orthogonal family. If (a;);e7 are real numbers, all but finitely many 
being equal to 0, then 


2 2; 2 
I do aivill? = $ alivi. 


ie] ie] 
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In particular, if (v;);e7 is an orthonormal family, then 
2 2 
I> aivill =) a. 
ie] iel 
This can be proved in the same way as the previous theorem: we have 


Ido avil = (X aivi, X ajvj) = 


iel iel jel 
5 aiaj (vi, vj} = X a? (vi, vi) = X a7 |lvill?, 
ijel iel iel 


since the family is orthogonal. Note that the algebraic operations are allowed since 
we assumed that all but finitely many of the a;’s are zero, thus we never manipulate 
infinite sums. 


Remark 10.61. Let us come back to the discussion preceding the previous theorem. 
If f : R > R is a continuous 27-periodic map, we deduce from that discussion 
and the previous theorem that S,,(/) (the nth partial Fourier series of f) is the 
orthogonal projection of f on the space 7,, of trigonometric polynomials of degree 
at most n and that 


fey < IISIP = [ Ores 


gEFn 


This can be rewritten in terms of the Fourier coefficients am( f ), bm( f) of f as 


2 n 
a + DY + be )< Ji f(x dx. 


Since this holds for all n, we deduce that the series 


DOSP + kP) 


k>1 


converges and 


AUS + Daf +6 = l feas. 


The convergence of the previous series yields 


lim an(f) = lim bi(f) = 9, 
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a nontrivial result known as the Riemann-Lebesgue theorem. On the other hand, 
one can prove (this is the famous Plancherel theorem) that the previous inequality 
is actually an equality, that is for all continuous 27-periodic maps f we have 


2 oo m 
i + LU + bP) = iow 


The proof is beyond the scope of this book. A good exercise for the reader is to 
convince himself that Plancherel’s theorem can be rewritten as 


Tle? =— =f fleyPdx, 


kez 


where we recall that 
1 ‘i —ikx 
c(f)= 5> | fxje dx. 
2m Jr 


Plancherel’s theorem also holds for functions f : R —> R which are piecewise 
continuous and 27 -periodic. 


Problem 10.62. a) Determine an orthogonal basis of R? containing the vector 
w = (1,2,—1). 

b) Let W be the subspace of R? spanned by w. Find the projection of v = (1,2, 1) 
onto the orthogonal complement of W. 


Solution. a) We look for an orthogonal basis w, vı, v2 of Rè. In particular vı, v2 
should be an orthogonal basis of (Rw)+. A vector v = (x, y, z) belongs to (Rw)+ 
if and only if 

0 = (v,w) = x + 2y =z. 
Thus we must have 
vi = (x1, y1, X1 +291), v2 = (x2, y2, X2 + 2y2) 
for some real numbers x, x2, Y1, Y2. Moreover, we should have (vı, v2) = 0 
and vı, v2 should be nonzero: this automatically implies that vı, v2 are linearly 


independent (because (vj,v2) = 0) and so they form a basis of (Rw). 
The condition (v1, v2) = 0 is equivalent to 


X1X2 + yi y2 + (xı + 2y1)(%2 + 2y2) = 0. 
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We see that we have lots of choices: to keep things simple we choose x; +2y,; = 
0, for instance y; = 1 and x; = —2. Then the condition becomes —2x2 + y2 = 0, 
so we choose x2 = | and y2 = 2. This gives 


vı = (-2,1,0), v= (1,2,5). 


We insist that this is only one of the many possible answers of the problem. 

b) As we have already seen in part a), the orthogonal complement of W is exactly 
Span(vı, v2) and an orthogonal basis of W+ is given by vı, v2. Applying the 
previous theorem yields 


ie (vvi), (v, v2) 
ey [vle lvl? 
1 125 
= 302 = G39). 


We could have done this in a much easier way as follows: instead of computing 
Pw (v) we compute first pw (v). Now an orthogonal basis of W is given by w, thus 


omw 4 24 2 
MS ie a a 
Next, we have 
125 
Pw) =v- pw) = (3 3° 3) O 


The previous results concerning orthonormal bases show rather clearly the 
crucial role played by these objects. Yet, we avoided a natural and very important 
question: can we always find an orthonormal basis? The answer is given by the 
following fundamental theorem. We do not give its proof right now since we will 
prove a much stronger result in just a few moments. 


Theorem 10.63. Any Euclidean space has an orthonormal basis. 


The following theorem refines Theorem 10.63 and gives an algorithmic con- 
struction of an orthonormal basis of an Euclidean space starting with an arbitrary 
basis of the corresponding vector space. It is absolutely fundamental: 


Theorem 10.64 (Gram-Schmidt). Let vı,...,vąa be linearly independent vectors 
in a vector space V over R (not necessarily finite dimensional), endowed with an 
inner product ({ , ). Then there is a unique orthonormal family e;,..., eq in V with 
the property that for all k € |1, d] we have 


Span(e1,..., ek) = Span(vı,..., vk) and (ex, vk} > 0. 
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Proof. We will prove the theorem by induction on d. Let us start with the case 
d = 1. Suppose that e; is a vector satisfying the conditions imposed by the theorem. 
Since e; € Rv, we can write e} = Av, for some real number A. Then (e1, vi) = 
A\\v1||? is positive, thus A > 0. Next, |Je;|| = 1, thus |A| = TT and so necessarily 
A= ET and e; = BI” Conversely, this vector satisfies the desired properties, 
which proves the theorem when d = 1. 

Assume now that d > 2 and that the theorem holds for d — 1. Let vy,..., vy be 
linearly independent vectors in V. By the inductive hypothesis we know that there 
is a unique orthonormal family e4, .. ., €g—1 satisfying the conditions of the theorem 
with respect to the family v1, ..., vq—1. It suffices therefore to prove that there is a 
unique vector eg such that e1, ...,eq satisfies the conditions of the theorem with 


respect to v1, ... , Vq, that is such that 
Ileal] =1, (ea,e;)=0 Yl<i<d-l1, 
and 
Span(e1,..., eq) = Span(vı,..., va). 
Assume first that eg is such a vector. Then 
ea € Span(e1,..., €q) = Span(vı, ..., va) = Rvg + Span(,...,veg—1) 
= Rv4 + Span(e),...,e@q—1). 


Thus we can write 


d-l 
eq = Ava + X aie; 


i=l 


for some real numbers 1, a),...,@qa—. Then for all i € [1,d — 1] we have (since 
€1,...,@q—1 is an orthonormal family) 
d-1 
0 = (ea,e;) = A(va, ei) + > aj(ej ei) = À (va, ei) + ai, 
j=l 
thus a; = —À (v4, e;) are uniquely determined if A is so. Next, we have 
d-1 


ea = Ava — X (va ei}ei). 


i=l 
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Note that z := wy — DE] (va, er) ei is nonzero, since otherwise vg € 
Span(e},...,@¢-1) = Span(y,...,va—1), contradicting the hypothesis that 
V1,-..,Vq¢ are linearly independent. Now ||ea|| = 1 forces |A| = Ta and the 


condition (eg,va) > 0 shows that the sign of A is uniquely determined and is 
actually positive: 


(ea, va) = (ea, £ + Plasie) = 


We deduce that À is uniquely determined by 
1 
| 


I|z 


and the uniqueness follows. 

Conversely, we can define A = ral and eg = Az. The previous computations 
show that eg satisfies all required properties and this proves the existence part and 
finishes the proof of the inductive step. o 


Remark 10.65. a) Let us try to understand the proof geometrically (i.e., let us give 
a less computational and more conceptual proof of the theorem). Assuming 
that we constructed e1, ...,e€q—1, we would like to understand how to construct 
eq. This vector eg must be orthogonal to e;,...,eg—; and it must belong 
to W = Span(v,...,va). It follows that eg must be in the orthogonal of 
Span(e,...,@¢—1) = Span(1,..., v¢—1). However 


dim Span(,..., va—1)} = dim Span(vı, . . . , va) — dim Span (vı, .. . , Va—1) 
=d-—(d-1)=1, 
thus eg is uniquely determined up to a scalar. Since we further want eq to be of 
norm 1, this pins down e4 up to a sign. Finally, the condition that (eg, va) > 0 


determines uniquely the sign and so determines uniquely eq. 
b) Part a) (and the proof of the theorem also) gives the following algorithm, 


known as the Gram-Schmidt process, which constructs e),..., eg starting from 
Vi,-.., Vd. Set fi = vı and e; = tT then assuming that we constructed 
fis---, fk-1 and €1,...,ek—1, let 
k-1 
Jk 


fk =vk— (vk, eiei, and e, = ——., 
2 I| x || 


i=1 


That is, at each step we subtract from 1; its orthogonal projection 
YT (vp, er) ej onto Span(e;,...,ex—1) and obtain in this way fp. Then 
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we normalize f, to get e,. Note that in practice it can be very useful to 
observe that we can compute || f;,|| via 


[I fell? = (fe ve). 


This formula follows from the fact that v = fk + Se (vk, eije; and e; is 
orthogonal to fẹ for 1 <i<k-l1. 


Example 10.66. Let us consider the vectors 
vı = (1,1,1), vz = (0,2,1), v3 = (3,1,3) € R°. 


An easy computation shows that the determinant of the matrix whose columns 
are v1, V2, v3 is nonzero, thus vı, v2, v3 are linearly independent. Let us follow the 
Gram-Schmidt process and find the corresponding orthonormal basis of R3. We set 


ov v H 1 1 1 ) 
Iall V3 V3 V3 V3 


Next, set 
fo = v2 — (vn, €1)€) = vo — Ve, = v — (1, 1, 1) = (-1, 1,0) 


and 


a= 2 222-5, 4o 
all A V3 


Finally, set 


fs = v3 — (v3, e1)e1 — (v3, e2)e2 = 


7 7 7 
v — —=e1 + V2e = G, 1,3) — (5, 


—1,1,0 
zy 1 1 3 
S37) 3°73 
and 
1 
ez; = Ti (=1,—1,2). 


IAIL V6 
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Problem 10.67. Let V be the space of polynomials with real coefficients whose 
degree does not exceed 2, endowed with the inner product defined by 


1 
(P.o) = [PO Qlwde. 
-1 
Find the orthonormal basis of V obtained by applying the Gram-Schmidt process 
to the basis 1, X, X? of V. 


Solution. We start with v; = 1, vz = X and v3 = X? and apply the Gram-Schmidt 
process. We obtain 


1 
lIvil]| = v2, n= a5 
then 
1.1 1 f! 
= = , a) Se = X- d = X 
h=v asa! a TE 
and 
: 2 
All? = (fa, v2) = | pa n 
thus 
h 3 
a= = 4 >X. 
Ill 2 
Finally, 
reas | 3 i 
= v3 — (m3, =) = — (V3, yf 5X) 4 5X% 
Aa on 3x03 
1 1 3 1 1 
= X?__ 2d -id 3d X= X? -L 
zf” rE is 3 
and 
2 gee i 8 
IAP = (fave) = f 22@?- Dae = 2. 
Hence 


F b = 
3 = — "i 
IAI 2 2 
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1 3 3X? —1 /5 
, X, ; o 
JZ V2 2 V2 


The following problem is a generalization of the previous one. It is much more 
challenging and represents an introduction to the beautiful theory of orthogonal 
polynomials. 


Hence the answer is 


Problem 10.68 (Legendre’s Polynomials). Letn > 1 and let V be the space of 
polynomials with real coefficients whose degree does not exceed n, endowed with 
the inner product 


1 
(P,Q) = I P(x)O(x)dx. 


Let L, be the nth derivative of (X? — 1)”. 


a) Prove that Lo,..., Ln is an orthogonal basis of V. 

b) Compute ||Lx||- 

c) What is the orthonormal basis of V obtained by applying the Gram-Schmidt 
process to the canonical basis 1, X,..., X” of V? 


Solution. For j € [0,7] let P; = (X? — 1)/ and note that —1 and 1 are roots 
with multiplicity j of P;. It follows that for k € [0, j], —1 and 1 and roots with 


multiplicity j — k of P (kth derivative of P;). If P € V, we deduce from this 
observation and integration by parts that for j > 1 


1 
(L;,P)= 4 PY (x) P(x)dx = PY PE) 


1 1 
f PID (x) P'(x)dx = -f PI (x) P'(x)dx 
= -1 


and repeating this argument gives 


1 
(L;, P) = cot f PY (x) P® (x)dx 
1 


for k € [0, j] and P € V. Taking j = k yields the fundamental relation 


1 1 
(Ls. P= f a -1PC ads = | ax P™(x)dx (10.5) 
-1 api l 


for k € [0,n] and P € V. 
It is now rather easy to deal with the problem. 
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a) For j < k we have by relation (10.5) 
! k 7 (k) 
(Lk, Lj) = fa-% L; (x)dx. 


Since degL; = j, we have i = 0 and so (Lk, Lj} = 0, proving that 
Lo,..., Ln is an orthogonal family. 
b) By definition L, has degree n and 


L®(X) = (X? — 1)")& = (X7")@ = 2n(2n -1)...1 = (2n)!. 
We deduce from relation (10.5) that 
1 1 
(lia: Sg = ent f (1 —x?)"dx = 20n)! | (1 —x?)"dx. 
-1 0 
Let 
1 
I, = / =x Yaz 
0 


and observe that an integration by parts yields 


1 1 
I, =x(1— xli -f x(1 — x3"! (—2x) = 2n f x? (1 — x3" ldx 
0 0 


1 
=2n | A- 0-0-7 dx = 20C — h), 
0 
thus 
Qn + 1)I, = 2nI,-. 
Taking into account that J) = 1 we obtain 


i -T] I; =J] 2i 2”n! = Pn? 
we Aer 2 SL 13 n+) ë Cat Ty 


Finally 


g2nt1y 12 


Ln = Ln, Ln = 2(2 Wy, = 
Lal? = (Ln, Ln) = 220) My = 
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and 


I|Lnl| = 2" nl. 


2n+1 


c) Let Q; = TT Then by part a) the family Qo,..., Qn is orthonormal in V and 
since dim V = n + 1, it follows that Qo,..., Qn is an orthonormal basis of V. 
Moreover, we have deg Q} = deg L = k fork € [0,n], which easily implies 
that 


Span(Qo,..., Qx) = Span(X°,..., X*) 
for k € [0, n]. Finally, 


(Xt, L) TEP aside 


= > 0, 
|Lel| |Le|| 


(Op) = 


since we have already seen that EY is a positive real number. We conclude 
that Qo,..., Qn is obtained from 1, X,..., X” by applying the Gram-Schmidt 
process. oO 


10.5.1 Problems for Practice 


1. Apply the Gram-Schmidt algorithm to the vectors 
vı = (1,2,—2), w= (0,-1,2), v3 = (-1,3, 1). 


2. Consider the vector space V of polynomials with real coefficients and degree 
not exceeding 2, endowed with the inner product defined by 


1 
(P, 0) =f xP(x) O(x)dx. 


Apply the Gram-Schmidt algorithm to the vectors 1, X, X?. 
3. Consider the map ( , ) : R? x R? — R defined by 


((X1, X2, X3), (Y1, Y2, ¥3)) = (x1 + x2 + x3)(y1 + y2 + y3)+ 


(x2 + x3)(y2 + y3) + x3y3. 


a) Check that this defines an inner product on R3. 
b) Applying the Gram-Schmidt algorithm to the canonical basis of R*, give an 
orthonormal basis for R? endowed with this inner product. 
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4. 
5. 


[00] 


Find an orthogonal basis of R* containing the vector (1,2, —1, —2). 
(The OR factorization) Let A E€ Mm. n(R) be a matrix with linearly independent 
columns C),...,C,. Let W be the span of C),...,C,, a subspace of R”. 


a) Prove that there is a matrix Q € Mm.n(R) whose columns are an orthonor- 
mal basis of W, and there is an upper-triangular matrix R € M,,(R) with 
positive diagonal entries such that 


A=OR. 


Hint: the columns of Q are the result of applying the Gram-Schmidt process 
to the columns of A. 

b) Prove that the factorization A = QR with Q, R matrices as in part a) is 
unique. 


. Using the Gram-Schmidt process, find the QR factorization of the matrix 


235 
A=|046 
007 


. Find the OR factorization of the matrix 


12 
A=]|21 
13 


. Describe the QR factorization of an upper-triangular matrix A € M,,(R). 
. If f : R —> Ris a continuous 27-periodic function, we denote 


1 g 
cn(f) = =I fœ)e "dx. 


a) Prove that if f is continuously differentiable, then for all n € Z we have 


Ca (f) = in- en(f). 


b) Deduce that under the assumptions of a) we have 


lim nc (f) = 0 
noo 
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and 


Yo nlen(f < o. 


nEZ 


Hint: use the Riemann—Lebesgue theorem and Bessel’s inequality, as well as 
part a). 

c) Prove that if f,g : R — R are continuous 27-periodic maps such that 
Cn(f) = ¢n(g) forall n € Z, then f = g. Hint: use Plancherel’s theorem 
for the function f — g. 


10. Consider the 27-periodic function f : R —> R such that f(0) = f(z) = 0, 
F(t) = 0 fort € (0, 7) and f is odd, i.e., f(—x) = — f (x) for all x. 


a) Explain why such a map exists, plot its graph and show that it is piecewise 
continuous. 

b) Compute its Fourier coefficients a,,(f) and bm (f) for all m > 0. 

c) Using Plancherel’s theorem, deduce Euler’s famous identity 


y 1 r 
(Q2n+1)2 8° 


n>0 


d) Deduce from part c) the even more famous Euler’s identity 


11. Consider the 27r-periodic function f : R — R such that f(t) = t° fort € 
[—z, a]. 


a) Compute the Fourier coefficients of f. 
b) Using Plancherel’s theorem, prove the following identity 


12. Let E be an Euclidean space, let e;,...,e@, be an orthonormal basis of E and 
let T be a linear transformation on E. Prove that 


n 


Tr(T) = $ (T (e;),e;). 


i=l 


13. Let V be an Euclidean space and let T be a linear transformation on V such 
that Tr(T) = 0. Let e;,..., en be an orthonormal basis of V. 
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14. 


15. 


16. 


a) Prove that one can find i,j € {1,2,...,m} such that (7 (e;),e;) and 
(T(e;),e@;) have opposite signs. Hint: use Problem 12. 
b) Check that the map f : [0, 1] —> R defined by 


S(t) = (T(te; + (1—t)ej), te; + (1 —te;) 


is continuous and that f(0) f(1) < 0. 
c) Conclude that there is a nonzero vector x € E such that 


(T(x), x) = 0. 
d) Finally, prove by induction on n that there is an orthogonal basis of V in 


which the diagonal entries of the matrix of T are all equal to 0. 


Let V be an Euclidean space of dimension n, let e),...,é@, be an orthonormal 
basis of V and let T : V —> V be an orthogonal projection. Show that 


rank(T) = > IIT (eD. 


i=l 
Let V be an Euclidean space of dimension n and let e;,...,e, be nonzero 
vectors in V such that for all x € V we have 


n 


X (ek, x} = Ilx]. 


k=1 


a) Compute the orthogonal of Span(e,...,e,) and deduce that e1, ..., en isa 
basis of V. 

b) By choosing x = e;, prove that ||e;|| < 1 forall 1 <i <n. 

c) By choosing x € Span(e1,...,€i—1, @i41,---+€n)~, prove that ||e;|| = 1 for 
alll <i <n. 

d) Conclude that e;,...,é@, is an orthonormal basis of V. 


(Hermite’s polynomials) Let n be a positive integer and let V be the space of 
polynomials with real coefficients whose degree does not exceed n, endowed 
with 


(P,Q) = Í POOD dt. 


a) Explain why (, ) is well defined and an inner product on V. 
b) Define hy = (X*e~*)e* for k > 0. What are the coefficients of hg? 
c) Prove that for all k € [0, n] and all P € V we have 


(P, he) = (1) L P®t)tke dt. 
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17. 


18. 
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d) Prove that ho, ...,, is an orthogonal basis of V. 
e) Prove that ||h,|| = k! for k € [0,n]. 


(Chebyshev’s polynomials) Let n be a positive integer and let V be the space 
of real polynomials with degree not exceeding n, endowed with 


j POO y 
2 = ee 


a) Explain why (, ) makes sense and defines an inner product on V. 

b) Prove that for each k > O there is a unique polynomial 7; (the kth 
Chebyshev polynomial) such that Tų (cos x) = cos kx forall x € R. 

c) Prove that To,..., T, is an orthogonal basis of V. 

d) Find ||7;,|| for k € [0,7]. 


(P,Q) = 


(Cross-product) Let V be an Euclidean space of dimension n > 3 and 


let (€,,...,@,) be a fixed orthonormal basis. If vj,...,v, E V, write 
det(v;,...,¥,) instead of dete; 6) V1; <- < Vn). 
a) Letvi,...,Vn—1 € V. Prove the existence of a unique vector v)A...AVn—1 € 


V such that for all v € V 
det(v1,...,Vn—1,¥) = (v, v1 A... A Vn=1) (10.6) 


We call this vector vj A... A vn—1 the cross-product of vi, ..., Vn—1. 
b) Prove that vı A... A Vn—1 is orthogonal to vj,..., Vn—1- 
c) Prove that v1, ... , Vn—1 are linearly dependent if and only if 


VA... N Vn- =0. 


d) Let v; = };—; ajje;. By choosing v = e; in (10.6) prove that 


n 
Vi Naess A Vy-1 = XODA; -ĉi, 


i=l 


where A; is the determinant of the (n — 1) x (n — 1) matrix obtained from 
[a;;] (i.e., the matrix whose columns are v;,...,Vn—1) by deleting the ith 
row. In particular, if n = 3 check that 


Uy VI U2V3 — U3V2 
uz | A| v | = | WV, — uiv 
u3 V3 Uy V2 — U2V1 
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e) Prove that if fi,..., fn is an orthonormal basis of V, then det( fi, ..., fa) € 
{—1, 1}. We say that fi,..., fn is positive or positively oriented (with 


respect to (e1,...,@n,)) if det(fi,..., fr) = 1. 
f) Prove that if vj,...,¥,—1 is an orthonormal family, then v1, ... , Vn—1, V1 A 
...A Vn—1 18 a positive orthonormal basis. 


19. Let vj,...,V,—1 be linearly independent vectors in a Euclidean space V of 
dimension n > 3. Let H be the hyperplane spanned by v1, ..., Vn—=1- 


a) Prove that for all v € V we have 


(vp A... A Vn, V) 
Iivi A... A Vn—1||? 


Pu(v) =v— (vp A... AV) 


and 


|v, vp A... A Vn) 


d(v, H) = ; 
oe ala (0 cere ST 


b) Prove that 
H = {ve V| (v, vi A... Avn) = 0}. 


20. In this problem V is an Euclidean space of dimension 3. 


a) (Lagrange’s formula) Prove that for all x, y € V we have 
(x,y)? + lx A yl? = Ixl? + lyi. 
b) Prove that if 0 is the angle between x and y, then 
lx A yl] = [xli yI | sin 8]. 


21. This exercise develops the theory of orthogonal bases over the complex 
numbers. Let V be a finite dimensional vector space over C, endowed with a 
hermitian inner product ( , ), i.e., a hermitian sesquilinear form ( ,):VxV —> 
C such that (x, x) > 0 for all nonzero vectors x € V. Such a space is called a 
hermitian space. Two vectors x, y € V are orthogonal if (x, y) = 0. Starting 
with this definition, one defines the notion of orthogonal/orthonormal family 
and orthogonal/orthonormal basis as in the case of vector spaces over R. 


a) Prove that an orthogonal family consisting of nonzero vectors is linearly 
independent, and deduce that if dimV = n, then an orthonormal family 
consisting of n vectors is an orthonormal basis of V. 
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b) Let e;,...,e, be an orthonormal basis of V and let x = xie +... + Xn€n 
and y = yjey +... + Ynen be two vectors in V. Prove that 


(x, y) = X11 +... 4+ XnYn 
and 
Ixl? = pa +... + lanl. 


c) State and prove a version of the Gram-Schmidt process in this context. 

d) Prove that there is an orthonormal basis of V. 

e) Prove that any orthonormal family in V can be completed to an orthonormal 
basis of V. 

f) Let W be a subspace of V and let w),..., wg be an orthonormal basis of W. 

i) Prove that W @ W+ = V and (W+)+ = W. 

ii) The orthogonal projection pw of V onto W is the projection of V onto W 
along WŁ. Prove that for all v € V 


k 
pwo) = $ (wi, vw 


i=1 


and 


Ilv— pw (v)|| = min ||v — w|]. 
wew 


10.6 The Adjoint of a Linear Transformation 


Let (V, (, )) be an Euclidean space (the condition that V is finite dimensional will be 
crucial in this section, so we insist on it). Let T : V — V bea linear transformation. 
For all y € V, the map x > (T(x), y) is a linear form on V. It follows from 
Theorem 10.37 that there is a unique vector T*(y) € V such that 


(T(x), y) = (T*(y), x) = (x, T*(y)) 


for all x € V. We obtain in this way a map T* : V — V, uniquely characterized by 
the condition 


(T(x), y) = (x, T*(y)) 


for all x, y € V. It is easy to see that T* is itself linear and we call T* the adjoint 
of 7. All in all, we obtain the following 
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Theorem 10.69. Let (V,(, )) be an Euclidean space. For each linear transforma- 
tion T : V — V there is a unique linear transformation T* : V — V, called the 
adjoint of T, such that for all x, y € V 


(T(x), y) = (x, T*(y)). 


As the following problem shows, the previous result fails rather badly if we don’t 
assume that V is finite dimensional: 


Problem 10.70. Let V be the space of continuous real-valued maps on [0, 1], 
endowed with the inner product 


1 
(fee f fgindt. 


Prove that the linear transformation T sending f to the constant map equal to f(0) 
has no adjoint. 


Solution. Suppose that T has some adjoint T*. Let W = ker T, that is the subspace 
of maps f with f(0) = 0. Fix g € V. Since 


(T(f),8) = (FT "(g)) 
for all f,g € V, we deduce that (T*(g), f) = 0 for all f € W. Applying this to 


the function f given by x > xT*(g)(x) which is in W, we conclude that 


1 
(T*(g), f) = j! sT Od = 0. 


Since x > x(T*(g)(x))? is continuous, nonnegative, and with average equal to 0, 
it is the zero map, thus T*(g) vanishes on (0, 1] and then on [0, 1] by continuity. 
We conclude that 7*(g) = 0 for all g € V, thus (T(f), g) = 0 forall fg € V 
and finally T(f) = 0 for all f € V. Since this is clearly absurd, the problem is 
solved. o 


Note that for all x, y € V we have 
(y, T(x)) = (T(x), y) = (x, T*0)) = (T* 0), x) = (y, T Œœ)) 
It follows that T(x) — (T*)* (x) = 0 and so 
(T*)" =T, 
which we write as 


FST, 
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Thus the map T — T* is an involution of the space of linear transformations on 
V. The fixed points of this involution are called symmetric or self-adjoint linear 
transformations. They will play a fundamental role in this chapter. More precisely, 
we introduce the following definitions: 


Definition 10.71. Let (V, (, )) be an Euclidean space. A linear transformation T : 
V = V is called symmetric or self-adjoint if T* = T and alternating or skew- 
symmetric if 7* = —T. 


In the next problems the reader will have the opportunity to find quite a few 
different characterizations and/or properties of self-adjoint and alternating linear 
transformations. 


Problem 10.72. Let (V,(, )) be an Euclidean space and let T : V > V be a linear 
transformation. Let e;,...,é, be an orthonormal basis and let A be the matrix of 
T with respect to e1, ..., en. Prove that the matrix of T* with respect to e1,..., €n 
is ‘A. Thus T is symmetric if and only if A is symmetric, and T is alternating if 
and only if A is skew-symmetric (be careful to the hypothesis that e4, ..., en is 
an orthonormal basis, nor just any basis!). 


Solution. Let B = [b;;] be the matrix of T* with respect to e1, .. . , €n, thus for all 
€ [1,n] we have 


T* (ci) = Do brier. 


k=1 


Since 
(T(e;),e;) = (ei, T* (e;)) 


and T(e;) = ee ay; ex, and since the basis is orthonormal we obtain 


(T(e), e;) = Yau (ex,ej) = Gji 


and 
(ei, T* (ej)} = X byy (ei, ek) = bij 
k=1 
We conclude that b;; = aj; for alli, j € [l,m], and the result follows. Oo 


Problem 10.73. Prove that any two distinct eigenspaces of a symmetric linear 
transformation are orthogonal. 
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Solution. Let 1,42 be different eigenvalues of T and let x, y be nonzero vectors 
in V such that T(x) = àx and T(y) = 2y. Since T is symmetric we have 


(T(x), y) = (x, T(y)). 
The left-hand side equals A, (x, y), while the right-hand side equals A, (x, y}. Since 
A, Æ Az, it follows that (x, y) = 0 and the result follows. o 


Problem 10.74. Let n be a positive integer and let V be the space of polynomials 
with real coefficients whose degree does not exceed n, endowed with the inner 
product defined by 


1 
(P,Q)= L P(x)O(x)dx. 


Prove that the linear transformation T:V —> V sending P to 2X P’(X)+(X? — 1) 
P(X) is symmetric. 


Solution. Since the maps P +» P’ and P + P” are linear, it follows that T is a 
linear transformation (note that T(P) belongs to V since deg XP’ < deg P < n 
and deg(X* — 1)P” < deg P < n). In order to prove that T is symmetric, we need 
to prove that 


(T(P), Q) = (P,T(Q)) 


for all polynomials P, Q € V. Note that using the product rule for derivatives, we 
can write 


T(P) = ((X?-1)P’Y. 


Hence integration by parts gives 


1 1 
(T(P), Q) = / (2 = 1) PQ) dx = f (=) P'G)Q'G) dx. 


Note that this last expression is symmetric in P and Q, so it also equals 
(T(Q), P) = (P, T(Q)). Thus T is symmetric. o 


Problem 10.75. Let V be an Euclidean space and let T : V — V be a linear 
transformation. 


a) Prove that T is alternating if and only if (T(x), x} = 0 for all x € V. 
b) Prove that if this is the case, then the only possible real root of the characteristic 
polynomial of T is 0. 
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Solution. a) Suppose that T is alternating, thus T + T* = 0. Then for all x € V 
we have 


(T(x), x) = (x, T*(x)) = (x, -T(x)) = —(T (Œ), x), 


thus (T(x), x) = 0. 
Conversely, suppose that (T(x), x) = 0 for all x € V. Thus for all x,y € V 
we have 


0 = (T(x + y), x + y) = (Tœ) + Ty), x+y) = 
(T(x), x) + (T(x), y) + (x, T(y)) + (TQ), y) = 


(T(x), y) + (T* (x), y) = (T + T% Œ), y). 


Thus (T + T*)(x) is orthogonal to V and thus it equals 0, and this holds for all 
x € V. It follows that T is alternating. 

b) Suppose that A is a real root of the characteristic polynomial of T. Thus there is 
a nonzero vector x € V such that T(x) = Ax. Then 


A\|x|/? = (Ax, x) = (T(x), x) = 0, 
and so A = 0. oO 


Problem 10.76. Let V be an Euclidean space and let e;,...,e, be a basis of V. 
Prove that the map T : V —> V defined by 


n 


T(x) = Do (ex, x)ex 


k=l 
is a symmetric linear transformation on V. Is T positive? Is it positive definite? 


Solution. Note that x —> (ex, x) is a linear map for all 1 < k < n (by definition of 
an inner product). It follows that T itself is a linear transformation of V. In order to 
check that T is symmetric, we need to prove that 


(T(x), y) = (x, T(y)) 
for all x, y € V. Using the bilinearity of ( , ), we obtain 


n n 


(T(x), y) = (Slee, x)er, y) = X (er, x) + (ex, Y). 


k=1 k=1 
A similar computation yields 


n 


(x, TO)) = (x, X (ex, vex) = ek, X) + (€k, Y), 
k=1 k=1 


establishing therefore the desired equality and proving that T is symmetric. 
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Notice that the previous computations also give 


n 


(T(x). x) = X ler, x)? 


k=1 


The last sum is nonnegative since it is a sum of squares of real numbers. It follows 
that T is positive. Moreover, if (T(x), x} = 0, then the previous argument yields 
(ek, x) = 0 for all 1 < k < n. Thus x is orthogonal to Span(e,...,é@,) = V and 
so x = 0. It follows that T is positive definite. Oo 


Problem 10.77. Let T be a linear transformation on an Euclidean space V. Prove 
that the following statements are equivalent: 


a) For all x € V we have ||T(x)|| = ||T*(x)]]. 
b) For all x, y € V we have (T(x), T(y)) = (T* (x), T*(Q)). 
c) T* and T commute. 


Such a linear transformation T is called normal. Note that symmetric as well as 
alternating linear transformations are normal. 


Solution. Suppose that a) holds. Using the polarization identity twice and the 
linearity of T and T*, we obtain 


_ ITE +IP- ITOP- ITO _ 


(Tœ, TO) 5 


I7*@ + y IP = IIT*@)IP — IT*OIP 


> = (T(x), T°). 


Thus b) holds. 
Suppose now that b) holds. For all x, y € V we have 


(To T* —T* o T)(x), y) = (T(T*(&)), y) — (T*(T)), y) 
= (T*(x), T*(y)) — (y, T* TŒ) = (T(x), TO)) — (TQ), TŒ) = 0. 
Thus (T o T* — T* o T)(x) = 0 for all x € V, that is T and T* commute and so 
c) holds. 
Finally, suppose that c) holds. Then 
ITIP = (TX), TŒ) = (x. TT) = 
(x, T(T*(x))) = (T(P*(x)), x) = (T* (x), T* (x) = IIT* ODI, 


thus ||7(x)|| = ||7*(x)|| for all x € V and so a) holds. The problem is solved. O 
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Problem 10.78. Let T be a normal linear transformation on an Euclidean space V. 
Prove that if V; is a subspace of V which is stable under T, then vt is also stable 
under T. 


Solution. The result is clear if V; = 0 or Vi = V, so assume that this is not the case. 
Choose an orthonormal basis e1, ..., €e of V obtained by patching an orthonormal 
basis of V; and an orthonormal basis of Vee Since V; is stable under T, the matrix 


of T with respect to e;,...,e, is of the form M = E a for some matrices 


A, B,C. Since T and T* commute, we have 
AB]. 'A O0] [A0 , AB 
oc] ['B'C] | Be ocj’ 
In particular, we must have C'C = ‘BB + ‘CC. Thus 


Tr(‘BB) = Tr(C 'C) —Tr(‘CC) = 0, 


which can be written as Dij bi, = 0, where B = [b;;]. We deduce that b;; = 0 for 
alli, j, that is B = 0. But then it is clear that vi is stable under T. oO 


10.6.1 Problems for Practice 


1. Let V be an Euclidean space and let T be a linear transformation on V. Prove 
that ker T* o T = ker T. Hint: if x € ker T* o T, compute ||7(x)||?. 

2. Let T be asymmetric linear transformation of an Euclidean space V. Prove that 
V =Im(T) @ kerT and that Im(T) and ker T are orthogonal. 

3. Prove that if T is a normal endomorphism of an Euclidean space V, then 


ker T = ker T*. 
4. Prove that if T is a linear transformation on an Euclidean space V, then 
det T = det T*. 
5. Prove that if T is a linear transformation on an Euclidean space V, then 
ker(T*) = Im(T)*+, Im(T*) = (ker T)Ł. 


6. Let V be an Euclidean space and let v € V be a vector with ||v|| = 1. Prove 
that if k is a real number, then the map 


T:V >V, T(x)=x+k(x,v)v 


is a symmetric linear transformation on V. 
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7. 


10. 


11. 


12. 


Let V be the space of polynomials with real coefficients whose degree does not 
exceed n and consider the map ( , ): V x V — R defined by 


! 1l-x 
(P,Q) = i Vinx rdx. 


a) Explain why ( , ) is well defined and an inner product on V. 
b) Prove that the map T : V — V defined by 


T(P(X)) = (X? — 1) P(X) + (2X + I) P(X) 


is a self-adjoint linear transformation on V. 


. Prove that if a, b are real numbers, then the linear transformation 


T:R?>R’, T(x, y) = (ax + by,—bx + ay) 


is normal. 


. Let V be an Euclidean space of dimension 2 and let T : V —> V be a normal 


linear transformation. Let A be the matrix of T with respect to an orthonormal 
basis of V. Prove that either T is symmetric or 


a b 
A= 
ss] 
for some real numbers a, b. 
Let P € GL,(R) be an invertible matrix and let E = M,,(R) endowed with the 
inner product given by 


(A, B) = Tr(A' B). 


Find the adjoint of the linear transformation T : E > E sending A to PAP™!. 
Let V be an Euclidean space and let T be a linear transformation on V such 
that ||7(x)|| < ||x|| for al x € V. 


a) Prove that ||T*CD|| < ||x|| for all x € V. 
b) Prove that ker(T — id) = ker(T* — id). 
c) Deduce that V is the orthogonal direct sum of ker(T — id) and Im(T — id). 


Let V be a hermitian space, that is a finite dimensional vector space over C 
endowed with a hermitian inner product ( , } : V x V —> C. 


a) Prove that for any linear transformation T : V — V there is a unique linear 
transformation T* : V — V (called the adjoint of T) such that for all 
x,yeV 


(x, T(y)) = (T*(x), y). 


450 10 Forms 


Be careful that the left-hand side is no longer equal to (T (y), x), but rather 


(T(y), x). 
b) Prove that the map T ++ T* is a linear involution on the space of linear 
transformations on V, such that for all S, T 


(SoT)* = T* o S*. 
c) Prove that T is invertible if and only if T* is invertible, and then 
(cy = (T7)*. 


d) If e;,...,@, is an orthonormal basis of V and if A is the matrix of T with 
respect to this basis, prove that the matrix of T* is A* := ‘A. We say that 
T is self-adjoint or hermitian if T = T*. 

e) Prove that any orthogonal projection is a hermitian linear transformation. 

f) Prove that ker 7* = (Im(7))+ and Im(T*) = (ker T)+. 

g) Prove that if T is hermitian, then the orthogonal of a subspace stable under 
T is also stable under T. 
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Let Vi, V> be Euclidean spaces with inner products (, ); and (, )2, and with 
corresponding norms || - ||; and || - ||2. 


Definition 10.79. An isometry (or isomorphism of Euclidean spaces) between 
Vı and Vz is an isomorphism of R-vector spaces T : V; — Vz such that for all 
x,yveV, 


(T(x), T(y))2 = (x, y). 


Thus an isometry is a bijective linear map which is compatible with the inner 
products on V; and V2. The following exercise gives an equivalent formulation of 
this compatibility: 


Problem 10.80. Let V; and V be as above and let T : V; — V» be a linear 
transformation. Prove that the following statements are equivalent: 


a) For all x, y € V; we have 


(T(x), T(y))2 = (x, y). 


b) For all x € Vi we have ||T(x)||2 = ||x]|1- 
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Solution. If a) holds, then taking y = x we obtain 
ITI = Illi 


and so ||7(x)||2 = ||x||1, showing that b) holds. 
If b) holds, then the polarization identity and the linearity of T yield 


TOTO = ZOTO- ITOR- ITOR 


2 
ITE + I = ITW IE- ITOE _ lx+ yli- Nel = yt _ 
2 = 7 = (x, y), 
finishing the solution. o 


Remark 10.81. If T is a linear transformation as in the previous problem, then T is 
automatically injective: if T(x) = 0, then ||T(x)||2 = 0, thus ||x||ı = 0 and then 
x =0. 


Definition 10.82. a) Let V be an Euclidean space. A linear transformation T : 
V — V is called orthogonal if T is an isometry between V and V. In other 
words, T is orthogonal if T is bijective and for all x, y € V 


(T(x), T(y)) = (x,y). 


Note that the bijectivity of T is a consequence of the last relation, thanks to 
the previous remark. Thus T is orthogonal if and only if T preserves the inner 
product. 

b) A matrix A € M,,(R) is called orthogonal if 


A'A= h. 

The equivalence between the first and last point in the following problem implies 
the following compatibility of the previous definitions: let A € M, (R) and endow 
R” with its canonical inner product. Then A is orthogonal if and only if the linear 
transformation X > AX on R” is orthogonal. Also, by the previous problem a 
linear map T on V is orthogonal if and only if ||T (x)|| = ||x|| for all x € V. Hence 
A is orthogonal if and only if 

[|AX|] = IIXII 


for all X € R”, where ||- || is the norm associated with the canonical inner product 


on R”, that is 
[|(x1,.--, Xn) || = xp +... + 2x2. 
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Example 10.83. A very important class of orthogonal transformations/matrices is 
given by orthogonal symmetries. Namely, consider an Euclidean space V and a 
subspace W. Then V = W @ W+, so we can define the symmetry sw with respect 
to W along W+. Recall that if v € V is written as v = w + wt with w € W and 
wt e WŁ, then 


sw(v) =w— wr, 


so that sw fixes pointwise W,, and —sw fixes pointwise wt. 

In order to see that sw is an orthogonal transformation, it suffices to check that 
l|sw(v)|| = ||v|| for all v € V, or equivalently 

qt d; 
[w= w|] = [lw + w|] 

for all (w,wt) € W x W+. But by the Pythagorean theorem the squares of both 
sides are equal to ||w||? + ||w+||?, whence the result. 

Orthogonal symmetries can be easily recognized among orthogonal maps: they 
are precisely the self-adjoint orthogonal transformations, that is their matrices in an 
orthonormal basis of the space are simultaneously symmetric and orthogonal. The 


point is that an orthogonal matrix A is symmetric if and only if A? = J, since 
A-'A=Ih. 


Let us come back to the general context of an orthogonal matrix A € M,,(R) and 
analyze a little bit the relation 


A'A= h. 


Using the product rule and denoting Rı,..., Rn, the rows of A, we see that the 
previous equality is equivalent to 


(R:;,R))=0 if i#j, |[RIP=1, 1<i<n, 


in other words A is orthogonal if and only if its rows R,,...,R, form an 
orthonormal basis of R”. Also, notice that A is orthogonal if and only if ‘A is 
orthogonal, thus we have just proved the following: 


Theorem 10.84. Let A € M,,(R) be a matrix and endow R” with the canonical 
inner product, with associated norm ||-||. The following statements are equivalent: 


a) A is orthogonal. 

b) The rows of A form an orthonormal basis of R". 

c) The columns of A form an orthonormal basis of R". 
d) Forall X € R” we have 


I[4X|| = I[XII. 
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Problem 10.85. Let V be an Euclidean space and let T : V — V be a linear 
transformation. Prove that the following assertions are equivalent: 


a) T is orthogonal. 

b) We have (T(x), T(v)) = (x, y) forall x,y € V. 
c) For all x € V we have ||7(x)|| = ||x|]. 

d) T* oT = Id. 


Solution. By definition a) implies b), which is equivalent to c) by Problem 10.80. 
If b) holds, then 


(T* o T(x) —x, y) = (y, T* (TŒ) — (x, y) = (T(x), TO)) — (x,y) =0 


for all x,y € V, thus T*(T(x)) = x for all x € V and d) follows. It remains to 
see that d) implies a). It already implies that T is bijective, with inverse T*, so it 
suffices to see that b) holds. Since b) is equivalent to c) by Problem 10.80, it suffices 
to check that c) holds. Or 


ITŒI = (Tx), T) = (x, T* TŒ) = (xx) = xl? 
for all x € V, which yields c). o 


We can also characterize orthogonal linear transformations in terms of their effect 
on orthonormal bases, as the following problem shows: 


Problem 10.86. Let V be an Euclidean space and let T : V — V be a linear 
transformation. Then the following statements are equivalent: 


a) T is orthogonal. 


b) For any orthonormal basis e),...,e, of V, the vectors T (e1), ..., T (en) form an 
orthonormal basis of V. 
c) There is an orthonormal basis e1,..., €n of V such that T(e1),..., T (en) is an 


orthonormal basis of V. 


Solution. Suppose that a) holds and let e;,...,e, be an orthonormal basis of V. 
Then for all i, j € [1, n] we have 


(T (ei), T(ej)) = (ei ej) = lis;. 


It follows that T(e1),..., T (en) is an orthonormal family, and since it has n = 
dim V elements, we deduce that it is an orthonormal basis of V. Thus a) implies b), 
which clearly implies c). 

Suppose that c) holds. Let x € V and write x = xie} +... + Xnen. Since 
T(e1),..., T (en) and e1, ..., en are orthonormal bases of V, we have 


ITI? = IxT (e1) +... + xn T (en)? = x? +... + x? = Jll. 


Thus ||T(x)|| = ||x|| for all x € V, and T is orthogonal (by the previous problem). 
oO 
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Remark 10.87. The previous problem has the following very useful consequence: 
let e1, ..., en be an orthonormal basis of V and let el, ...,@, be another basis of V. 
Let P be the change of basis matrix from e1, ..., €n to e},...,e/,. Then e},...,e,, 


is orthonormal if and only if P is orthogonal. We leave the details of the proof to 
the reader. 


Theorem 10.88. The set of orthogonal linear transformations on an Euclidean 
space V forms a group under composition. In more concrete terms, the composi- 
tion of two orthogonal transformations is an orthogonal transformation, and the 
inverse of an orthogonal transformation is an orthogonal transformation. 


Proof. If Ti, T) are orthogonal linear transformations, then Tı o T is a linear 
transformation and 


IT e PÆ = AEREI = = Il 


for all x €e V thus Tı o T, is an orthogonal linear transformation on V by 
Problem 10.85. Similarly, we prove that the inverse of an orthogonal transformation 
is an orthogonal transformation. The result follows. o 


The group O(V) of orthogonal transformations (or isometries) of V is called the 
orthogonal group of V. It is the group of automorphisms of the Euclidean space V 
and plays a crucial role in understanding the space V. 


Problem 10.89. Let V be an Euclidean space and let T be an orthogonal linear 
transformation on V. Let W be a subspace of V which is stable under T. 


a) Prove that T(W) = W and T(W+) = WŁ. 
b) Prove that the restriction of T to W (respectively W+) is an orthogonal linear 
transformation on W (respectively W+). 


Solution. a) This follows easily from Problems 10.85 and 10.78, but for the 
reader’s convenience we give a direct argument. Since T maps W into W by 
assumption and since T |w is injective (because T is injective on V), it follows 
that T|w : W — W is surjective, thus T(W) = W. The same argument reduces 
the proof of the equality T(W+) = W+ to that of the inclusion T(W+) c W+. 
Let x € WŁ and y € W. We want to prove that (T(x), y) = 0. But T is 
orthogonal, so T* = T7! (Problem 10.85) and so 


(TŒ), y) = (x, T'O). 


Since W is stable under T7!, we obtain T! (y) € W, and since x € w+, we 
must have (x, 7~!(y)) = 0. Thus (T(x), y} = 0 and we are done. 
b) Let T; be the restriction of T to W. Using Problem 10.85 we obtain for all x € W 


ITI = TOON = Ixl, 


thus using Problem 10.85 again we obtain that T; is an orthogonal linear map on 
W. The argument for W+ being identical, the problem is solved. o 
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We will now classify the orthogonal transformations of an Euclidean space in 
terms of simple transformations. The proof requires two preliminary results, which 
are themselves of independent interest. 


Lemma 10.90. Let V be an Euclidean space and let T be a linear transformation 
on V. Then there is a line or a plane in V which is stable under T. 


Proof. The minimal polynomial of T is a polynomial P with real coefficients. If it 
has a real root, it follows that T has an eigenvalue and so the line spanned by an 
eigenvector for that eigenvalue is stable under T. Suppose that P has no real root. 
Let z be a complex root of P . Then since P has real coefficients, Z is also a root of P 
and so Q = (X —z)(X —Z) divides P. Moreover, Q(T) is not invertible, otherwise 


5 would be a polynomial of smaller degree killing T. Thus there is a nonzero vector 


x € V such that O(T)(x) = 0. This can be written as T?(x) + aT(x) + bx = 0 
for some real numbers a, b. It follows that the space generated by x and T(x) is a 
plane which is stable under T, and the lemma is proved. o 


Lemma 10.91. Let V be a two-dimensional Euclidean space and let T be an 
orthogonal transformation on V with no real eigenvalue. Then there is an orthonor- 
mal basis of V with respect to which the matrix of T is of the form 


Rie ie edi 


sinô cos@ 


Proof. Let e1, e2 be an arbitrary orthonormal basis of V and write T (e1) = ae; + 
be for some real numbers a, b. Since 


a +b = ||T(eIP = lel? = 1, 
we can find a real number 0 such that a = cos 6 and b = sin 0. The orthogonal of 
T (e1) is given by the line R(— sin 6e; + cos e2). Since (T (e1), T(e2)) = (e1, e2) = 
0, we deduce that T (e2) E R(— sin 0e; + cos 6e2) and so 
T(e2) = c(— sin 0e; + cos Be) 
for some real number c. Since 


IIT (e2)|] = Ileal] = 1, 


we deduce that |c| = 1 and soc € {—1, 1}. It remains to exclude the case c = —1. 
But if c = —1, then the matrix of T with respect to e1, e2 is 


cos sin 
A= 
ey bee 
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and one can easily check that its characteristic polynomial is X? — 1, which has real 
roots. It follows that if c = —1, then T has a real eigenvalue, contradiction. The 
result follows. oO 


We are now ready for the proof of the fundamental theorem classifying orthogo- 
nal linear transformations on an Euclidean space: 


Theorem 10.92. Let V be an Euclidean space and let T be an orthogonal 
transformation on V. Then we can find an orthonormal basis of V with respect 
to which the matrix of T is of the form 


I, 0 0 0 

0 -4 0 0 

A=] 0 0 Ra, 0 

0 0 . 0 Ro, 

where 0,,..., 0% are real numbers and 
cos 0 — sin 0 
Rg = : 
í | sin@ cos | 

Proof. We will prove the result by induction on dim V. If dim V = 1, then 


everything is clear, since we must have T = +id. Assume now that dim V = n > 2 
and that the result is known in dimension at most n — 1. 
Suppose that T has a real eigenvalue A and let e; be an eigenvector. Then 


Allel] = Ael] = [T&D] = lleill, 


thus A € {—1,1}. Let W = Rej, then W is stable under T, hence W+ is stable 
under T (because T is orthogonal). Moreover, the restriction of T to W+ is still 


an orthogonal transformation, since we have ||T(œ)|| = ||x|| for all x € V, 
thus also for all x € W+. By the inductive hypothesis, W+ has an orthonormal 
basis é2,...,@, with respect to which the matrix of T restricted to W~ is of the 


right shape (i.e., as in the statement of the theorem). Adding the vector Tell and 
possibly permuting the resulting orthonormal basis Tall? €2,...,€n Of V yields an 
orthonormal basis with respect to which the matrix of T' has the desired shape. 
Assume now that T has no real eigenvalue. By Lemma 10.90 we can find 
two dimensional subspace V of T stable under T. Since T is orthogonal, the 
space WŁ is also stable under T, and the restrictions of T to W and W+ are 
orthogonal transformations on these spaces. By the inductive hypothesis W+ has 
an orthonormal basis e3,...,@, with respect to which the matrix of T |. is 
block-diagonal, with blocks of the form Ro,. By Lemma 10.91 the space W has an 
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orthonormal basis e;, e2 with respect to which the matrix of T |w is of the form Rọ. 
Then the matrix of T with respect to e;,...,é, has the desired shape. The theorem 
is proved. o 


We can also rewrite the previous theorem purely in terms of matrices: 


Corollary 10.93. Let A € M, (R) be an orthogonal matrix. There is an orthogonal 
matrix P € M, (R), integers p,q, k such that p + q + 2k = n and real numbers 
0i, ..., 8% such that 


I, 0 0 0 
0 —Iy 0 0 
A=P'| 0 0 Ra, 0 | P. 
0 0 . 0 Ro, 
Remark 10.94. a) The determinant of the matrix 

I, 0 0 0 

0 —I, 0 0 

0 0 Ro, 0 

0 0 . 0 Ro, 


is (—1)4 € {—1, 1}, since det Rọ, = 1 for 1 <i < k. It follows that 
det T € {-1, 1} 


for any orthogonal transformation T on V. Equivalently, det A € {—1, 1} for any 
orthogonal matrix A € M,(R). Of course, we can prove this directly, without 
using the previous difficult theorem: since A- ‘A = I, and det(‘ A) = det A, we 
deduce that 


1 = det(A - t A) = det(A)’, 


thus det A € {—1, 1}. 

An isometry T with detT = 1 is called a positive isometry, while an 
isometry T with det7 = —1 is called a negative isometry. Geometrically, 
positive isometries preserve the orientation of the space, while negative ones 
reverse the orientation. 

We can use the previous remark to define the notion of oriented orthonormal 
basis of V. Fix an orthonormal basis B = (e1, ..., €n) of V. If B’ = (fi,..., fn) 
is another orthonormal basis of V, then the change of basis matrix P from B to 
B' is orthogonal, thus det P € {—1, 1}. We say that B’ is positive or positively 
oriented (with respect to 6) if det P = 1, and negative or negatively oriented 
(with respect to G) if det P = —1. If V = R” is endowed with the canonical 
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inner product, then we always take for 6 the canonical basis, so we simply say 
that an orthonormal basis is positive or negative if it is positive or negative with 
respect to the canonical basis of R”. 

b) The characteristic polynomial of the matrix 


I, 0 0 0 
0—4... 0 0 
0 0 Ra... 0 
0 0... 0 Ra 


is 


(X — 1)? - (X +1- | [(X? - 2c0s6:X + 1). 


i=l 


Notice that the complex roots of the polynomial X? — 2 cos 0X + 1 are e’® and 
e™i? and they have absolute value 1. We deduce from the previous theorem that 
if A is a complex root of the characteristic polynomial of an orthogonal matrix, 
then |A| = 1. In other words, all complex eigenvalues of an orthogonal matrix 
have absolute value 1. This can also be proved directly, but the proof is trickier 
than the one that det A € {—1, 1} for an orthogonal matrix A. 


Let us try to study the orthogonal group in small dimension, by starting in 
dimension 2. We could use the previous theorem, but we prefer to give direct 
arguments in this case, since everything can be done by hand in a fairly simple and 
explicit way. So, let us try to understand orthogonal matrices A € M>(R). Consider 


a matrix 
ab 
A= 
[ea] 


satisfying A-'A = h. We know by the previous discussion that det A € {—1, 1} 
(recall that this is immediate from the relation A - ‘A = I). Therefore, it is natural 
to consider two cases: 


e det A = 1. In this case the inverse of A is simply 


yinfa -b 
7 —c a 


and since A is orthogonal we have AT! = ‘A, giving a = d and b = —c, that is 


[er 
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Moreover, we have a? + c? = 1, thus there is a unique real number 6 € (—z, 7r] 
such that a = cos 0 and c = sin 0. Therefore 


cos 6 — sin 0 
ASRS ee cos 0 |: 


The corresponding linear transformation T : R? — R? (sending X to AX) is 
given by 


T(x, y) = (cos 0x — sin 0y, sin 0x + cos 0y) 


and geometrically this is the rotation of angle 6. A simple computation shows 
that 


Ro, < Ro, = Ro,+6, = Ro, Ro (10.7) 


for all real numbers. In particular, all rotations commute with each other. 
An important consequence of this observation is that the matrix of T with respect 
to any positive orthonormal basis of R? is still Rọ (since the change of basis 
matrix from the canonical basis to this new positive orthonormal basis is still a 
rotation, thus it commutes with Rg). Similarly, one checks that the matrix of T 
with respect to any negative orthonormal basis of R? is R-9. The formula (10.7) 
also shows that it is very easy to find the angle of the composite of two rotations: 
simply add their angles and subtract a suitable multiple of 27 to bring this angle 
in the interval (—z, 7]. 


e det A = —1. Now the inverse of A is p b | thus the condition A~! = ‘A 
c —a 


yields d = —a and b = c. Also, we have a? + b? = 1, thus there is a unique 
real number 0 € (—z, 2] such that a = cos @ and b = sin 0. Then 


cos@ sind 
A = = . 
Se E La 


Note that Sọ is symmetric and orthogonal, thus S$? = J» and the corresponding 
transformation T : R? > R? 


T(x, y) = (cos 0x + sin Oy, sin 60x — cos 0y) 
is an orthogonal symmetry. In order to find the line with respect to which T is 


an orthogonal symmetry, it suffices to solve the system AX = X. An easy 
computation left to the reader shows that the system is equivalent to 


0 (8 
smig a a Vy 
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and so the line AX = X is spanned by the vector 


v= ((9)(9)) 


Note that the orthogonal of this line is spanned by the vector 


. (9 7 
= | —sin {| — ],cos[{ = 
e2 zJ’ 7 , 
and the vectors e;, e2 form a positive orthonormal basis of R? in which the matrix 


of T is nO : 
0-1 


One can easily check that 
So, ` So = Roo; 


thus the composite of two orthogonal symmetries is a rotation (this was actually 
clear from the beginning, since the product of two matrices of determinant —1 is a 
matrix with determinant 1). Similarly, one checks that 


So, Ro, = S66) Ro, So, = S646 


thus the composite of a rotation and an orthogonal symmetry is an orthogonal 
symmetry (this was also clear for determinant reasons). 
All in all, the previous discussion gives 


Theorem 10.95. Let A € M>(R) be an orthogonal matrix. 


a) If det A = 1, then 
cos 0 — sin 0 
A=R — 
á e d 


for a unique real number 0 € (—x, 7], and the corresponding linear transfor- 
mation T on R? is the rotation of angle 0. Any two such matrices commute and 
the matrix of T in any positive orthonormal basis of R? is Ro. 
b) If det A = —1, then 
Agee Ee sin 0 | 


sin ð — cos 0 


for a unique real number 0 € (—x,x]. The matrix A is symmetric and the 


corresponding linear transformation on R? is the orthogonal symmetry with 


respect to the line spanned by (cos (£) , sin (£)). 
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Let us consider now the more complicated case dim V = 3. Here it is no 
longer easy to do explicit computations, so we will use Theorem 10.92 and our 
understanding of the case dim V = 2 in order to understand the case dim V = 3. 

Recall the integers p,q,k from Theorem 10.92. Since 


p+q+2k =3, 


we see that necessarily p # 0 or q # 0. We can also prove this directly, observing 
that the characteristic polynomial of T has degree 3, thus it has a real root and so 
T has a real eigenvalue, which is necessarily equal to —1 or 1 since it has absolute 
value 1. 

Replacing T with —T, we exchange the roles of p and q. For simplicity, let us 
assume that p > 1, i.e., T has at least one fixed point v. Then T fixes the line 
D spanned by v, and induces an isometry on the plane P orthogonal to D. This 
isometry is classified by Theorem 10.95, which deals with isometries of a plane. 
Thus we reduced the case dim V = 3 to the case dim V = 2. We can be a little bit 
more explicit, by discussing the following cases: 


e Either T or —T is the identity map. This case is not very interesting. 

e We have dimker(T — id) = 2. If e2,e3 is an orthonormal basis of the plane 
ker(T — id), completed to an orthonormal basis e;,e2,e3 of V, then T fixes 
pointwise Span(e2, e3) and leaves invariant the line spanned by e;. Thus the 

2400 
matrix of T with respect to e;,@2,e3 is of the form | 0 10 | for some real 
001 
number À, which is necessarily —1 (it must be —1 or 1 since the matrix must 
be orthogonal, and it cannot be 1 as otherwise T = id). We deduce that T is 
the orthogonal symmetry with respect to the plane ker(7 — id). Notice that 
det T = —1 in this case (i.e., T is a negative isometry). 

e We have dim ker(T —id) = 1, thus ker(T —id) is the line spanned by some vector 
e; of norm 1. Complete e; to a positive orthonormal basis e1, e2, e3 of V = R°. 
For instance, one can simply find a vector e2 of norm 1 orthogonal for e4, and if 


uy vi 
e = uz and e? = V2 
U3 V3 
set 
U2V3 — U3V2 
e3 = ey A€2:= | u3vi — UY V3 


Uy, V2 — U2V1 
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The isometry that T induces on Span(eo, e3) has no fixed point (since all fixed 
points of T are on the line spanned by e1), thus it is a rotation of angle 6 for a 
unique real number 6 € (—x, z]. The matrix of T with respect to e1, e2, e3 is 


1 0 0 
Rọ := | O cos — sin 0 
0 sinf cos 


We say that T is the rotation of angle 6 around the axis Re. Note that det T = 
1, that is T is a positive isometry. Also, note that the angle 6 satisfies 


1 + 2cos0 = Tr(A), 


but this relation does not uniquely characterize the angle 0 (since —é is also 
a solution of that equation). In order to find 0, it remains to find sin 8. In order to 
do that, one checks that 


10 0 
dete; ,c,e3)(€1, €2, T (€2)) = |0 1 cos @ | = sind. 
00 sind 


Finally, assume that ker(T — id) = {0}. One possibility is that T = —id. Assume 
that T # —id. Since either T or —T have a fixed point (this follows from the fact 
that p or q is nonzero, i.e., that T has a real eigenvalue, which must be +1) and 
since T has no fixed point, it follows that —T has a fixed point. Let e; be a vector 
of norm 1 which is fixed by —7, thus T(e,;) = —e,. Complete e; to a positive 
orthonormal basis e1, e2, e3 of V, then arguing as in the previous case we deduce 
that the matrix of T with respect to e1, e2, e3 is 


-1 0 0 —1 00 
0 cos —sinð | = Re-} 0 10 
0 sind cos 001 


for some 0 € (—z, 2]. Thus T is the composite of a rotation of angle 0 and 
of an orthogonal symmetry with respect to the orthogonal of the axis of the 
rotation. Also, notice that det 7 = —1, thus T is a negative isometry. 

We can also slightly change the point of view and discuss the situation in 
terms of matrices. Consider an orthogonal matrix A € M3(R) and the associated 
linear transformation T : V > V sending X to AX, where V = R? is endowed 
with the canonical inner product. We exclude the trivial cases A = +]3. 
In order to study the isometry T, we first check whether T is a positive or 
negative isometry by computing det T = det A. 

Assume first that T is positive, i.e., det 4 = 1. We then check whether A 
is symmetric, i.e., A = ' A. Let us consider two cases: 
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e If A is symmetric, then A? = J; (since A is orthogonal and symmetric) and so 
T is an orthogonal symmetry. We claim that 7 is the orthogonal symmetry 
with respect to a line. Indeed, since A? = Is, all eigenvalues of A are —1 or 1. 
Moreover, they are not all equal since we excluded the cases A = +73, and their 
product is 1, since det A = 1. Thus one eigenvalue is | and the other 2 are equal 
to —1. It follows that the matrix of T with respect to some orthonormal basis 


10 0 
€1,€2,€3 of R? is | 0—1 O | and T is the orthogonal symmetry with respect 
00 -1 


to the line spanned by e;. To find this line, we compute ker(A — /3) by solving 
the system AX = X. A basis v of the space of solutions of this system will span 
the line we are looking for. 

¢ If Ais not symmetric, then A is a rotation of angle 0 for a unique 0 € (=x, z]. 
We find the axis of the rotation by solving the system AX = X: if Ae; = e; 
for some unit vector e4, then the axis of the rotation is spanned by e,. To find 
the angle of the rotation, we start by using the relation 


1+ 2cos@ = Tr(A), (x) 


which pins down 0 up to a sign. Next, we choose any vector e2 of norm 1 
orthogonal to e; and we set e3 = e1 ^ e2. Then e1, e2, e3 is a positive orthonormal 
basis of R? and det(e,,c9,e3)(€1,€2, Ae2) gives sin 0, which then determines 0 
uniquely. Notice that in practice it suffices to find the sign of the determinant 
of the vectors e1, €2, Ae. with respect to the canonical basis of R°, as this 
sign gives the sign of sin 0, which in turn determines 6 uniquely thanks to 
relation (x). 


Assume now that T is negative, i.e., det A = —1. Then —T is positive, thus the 
previous discussion applies to —T. 
Let us see two concrete examples: 


Problem 10.96. a) Prove that 


is an orthogonal matrix. 
b) Describe the isometry of R? defined by A, i.e., the map T : R? —> R? given by 
T(X) = AX. 


Solution. a) Using the product rule, one easily checks that A- ‘A = Js, thus A is 
orthogonal. Alternatively, one checks that the columns (or rows) of A form an 
orthonormal family. 

b) First, we check whether T is a positive or negative isometry by computing det A. 
An easy computation shows that det A = 1, so T is a positive isometry. Since 
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A is symmetric, we deduce from the above discussion that A is the orthogonal 
symmetry with respect to a line. To find this line, we solve the system AX = X. 
If x, y, z are the coordinates of X, the system is equivalent to 


—x + 2y + 2z = 3x 
2x — y + 2z = 3y 
2x + 2y —z = 3z 
and has the solution x = y = z. Thus T is the orthogonal symmetry with respect 
to the line spanned by (1, 1, 1). o 


Problem 10.97. Prove that the matrix 


2: 2- 
A=-=|-2 12 
1 —22 


is orthogonal and study the associated isometry of R3. 


Solution. One easily checks either that A- ‘A = J; or that the rows of A form an 
orthonormal family. Next, one computes det A = 1, thus the associated isometry T 
is positive. Since A is not symmetric, it follows that T is a rotation. To find its axis, 
we solve the system AX = X, which is equivalent to 


2x +2y+z= 3x 
—2x + y + 2z = 3y 
x— 2y + 2z = 3z 


and then to 
x=z y=0. 
1 
Thus the axis of the rotation is spanned by the vector | 0 |. We normalize it to make 
1 


it have norm 1, thus we consider instead the vector 


which spans the axis of T. 
Let 0 be the angle of the rotation, so that 


5 
1+2cos@ = Tr(A) = 3° 
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thus 
cos@ = -. 
3 


It remains to find the sign of sin 0. For that, we choose a unit vector orthogonal to 
e1, say 


ée,=]| 1 
0 
and compute the sign of 
10 2 
det(e;, ez, Aez) = —— |01 1 E 0, 
v2l10-2 v2 


thus sin 0 < 0 and finally 


10.7.1 Problems for Practice 


1. Prove the result stated in Remark 10.87. 
2. a) Prove that the matrix 


is orthogonal. 

b) Describe the isometry T : R? —> R? sending X to AX: is it positive or 
negative? If it is a rotation, describe the angle, if it is a symmetry describe 
the line with respect to which T is the orthogonal symmetry. 

3. a) Prove that each of the following matrices is orthogonal 


001 1 12 2 10 0 
100], 3 21-2], 00 -1 
010 2-2 1 0-1 0 


b) If A is one of these matrices, describe the isometry T : R? —> R? sending X 
to AX (for instance, if T is a rotation then you will need to find the axis and 
the angle of the corresponding rotation). 


466 


10. 


11. 


12. 


13. 


14. 


. Find the matrix of the rotation of angle 
. Let V be an Euclidean space and let T : V — V bea linear map. Prove that T 
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Prove that the matrix 


2 —6 3 
A=- ]| -6-3-2 
3 —2 —6 


is orthogonal and study the associated isometry of R3. 


% around the line spanned by (1, 1, 1). 


is orthogonal if and only if ||7(x)|| = 1 whenever ||x|| = 1. 


. a) Describe all orthogonal matrices A € M,,(R) having integer entries. 


b) How many such matrices are there? 


. a) Describe the matrices in M,,(R) which are simultaneously diagonal and 


orthogonal. 
b) Describe the matrices in M,,(R) which are simultaneously upper-triangular 
and orthogonal. 


. Let V be an Euclidean space. Recall that if W is a subspace of V, then sy 


denotes the orthogonal symmetry with respect to W, that is the symmetry with 
respect to W along W+. 


a) Let v be a vector in V with ||v|| = 1 and let H = (Rv)+ be its orthogonal. 
Prove that for all x € V we have 


SH(x) = x —2(v,x)v. 


b) Let v;,v2 € V be vectors in V with the same norm. Prove that there is a 
hyperplane H of V such that sy (v1) = v2. 


Find the matrix (in the canonical basis of R*) of the orthogonal symmetry of 
R? with respect to the line spanned by (1, 2, 3). 

Find the matrix (in the canonical basis of R*) of the orthogonal symmetry of 
R? with respect to the plane spanned by (1, 1, 1) and (0, 1, 0). 

Let V be a three-dimensional Euclidean space and let r be a rotation on V and 
s an orthogonal symmetry. Prove that s o r o s is a rotation and describe its axis 
and its angle in terms of those of r. 

Let V be a three-dimensional Euclidean space. When does a rotation of V 
commute with an orthogonal symmetry of V? 

Let A = [a;;] € M,(R) be an orthogonal matrix. Prove that 


n 
n < Ds |ai;| <nJn. 
ij=1 


Hint: the sum of squares of the elements in each row is 1. For the inequality on 
the right use the Cauchy—Schwarz inequality. 
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15. 


16. 


17. 


18. 


19. 


Let A = [aij] € M, (R) be an orthogonal matrix. 


a) Let X be the vector in R” all of whose coordinates are equal to 1. Compute 
(X, AX), where ( , ) is the standard inner product on R”. 
b) Prove that 


n 
| >> al <n. 


ij=l 


Let v € R” be a nonzero vector. Find all real numbers k for which the linear 
map T : R” — R” defined by 


T(x) = x + k(x, v}v 


is an isometry. 
Let V be an Euclidean space and let T : V — V be a linear transformation 
such that (T(x), T(y)) = 0 whenever (x, y) = 0. 


a) Let x, y be vectors of norm 1 in V. Compute (x + y, x — y). 
b) Prove that there is a nonnegative real number k such that for all x € V 


I|7 (x) |] = All|. 


Hint: if ||x|| = ||y|| = 1, show that ||7(x)|| = ||T(y)|| using part a) and 
the hypothesis. 
c) Prove that there is an orthogonal transformation S on V such that T = kS. 


Let V = M,,(R) be endowed with the inner product 
(A, B) = Tr(‘ AB). 


Let A € V. Prove that the following statements are equivalent: 


a) A is orthogonal 
b) The linear transformation T : V —> V sending B to AB is orthogonal. 


(Cayley transform) 


a) Let A € M,,(R) be a skew-symmetric matrix. Prove that J, + A is invertible. 
Hint: if AX = —X, compute (AX, X) in two different ways. 

b) Prove that if A € M,,(R) is skew-symmetric, then (J, — A)(J, + A)~! is an 
orthogonal matrix which does not have —1 as eigenvalue. 

c) Conversely, prove that if B is an orthogonal matrix not having —1 as 
eigenvalue, then we can find a skew-symmetric matrix A such that B = 
(In wy A)(In ag Ay. 

d) Prove that the map A > (J, — A)(I„ + A)! induces a bijection between the 
skew-symmetric matrices in M, (R) and the orthogonal matrices in M, (R) 
for which —1 is not an eigenvalue. 
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20. 


21. 


22: 


23. 


24. 
25: 
26. 


27. 
28. 


29. 
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(Compactness of the orthogonal group) Let (Ak)k>1 be a sequence of orthogo- 
nal matrices in M, (R). Let a®) be the (i, j )-entry of Ax. Prove that there exists 
a sequence of integers kı < ky < ... such that for all i, j € {1,2,...,m} the 
sequence a )iz1 converges to some real number x;; and such that the matrix 
X = [x;;] is an orthogonal matrix. Hint: use the classical fact from real analysis 
that each sequence in [—1, 1] has a convergent subsequence. 

Let A € M,,(R) be a skew-symmetric matrix and let T : R” —> R” be the map 
X — AX. Prove that there is an orthonormal basis of R” with respect to which 
the matrix of T is a block-diagonal matrix, in which each block is either the 


; : 0a . 
zero matrix or a matrix of the form 0 for some real number a. Hint: use 
—a 


induction on n and Lemma 10.90, and argue as in the proof of Theorem 10.92. 
Prove that if A € M,,(R) is a skew-symmetric matrix, then det A > 0 and the 
rank of A is even. 

In the following problems we consider a finite dimensional vector space V 
over C endowed with a positive definite hermitian product ( , ) and associated 
norm ||||. A linear map T : V > V is called unitary or an isometry if 


(T(x), T(y)) = (x,y) 


for all x, y € V. A matrix A € M,,(C) is called unitary if the associated linear 
map C” — C” sending X to AX is unitary (where C” is endowed with its 
standard hermitian product). 

Prove that for a linear map T : V — V the following assertions are 
equivalent: 


a) T is unitary. 
b) We have ||7(x)|| = ||x|| for all x € V. 
c) T maps unit vectors (i.e., vectors of norm 1) to unit vectors. 


Prove that a matrix A € M,,(C) is unitary if and only if A- A* = J,, where 
A* = ‘A is the conjugate transpose of A (thus if A = [a;;] then A* = [a;r]. 
Prove that the inverse of a unitary matrix is a unitary matrix, and that the product 
of two unitary matrices is a unitary matrix. 

Prove that if A is a unitary matrix, then | det A| = 1. 

Describe the diagonal and unitary matrices in M, (C). 

Prove that for a matrix A € M,,(C) the following assertions are equivalent: 


a) A is unitary. 


b) There is an orthonormal basis X,,..., X, of C” (endowed with its standard 
hermitian product) such that AX,,..., AX, is an orthonormal basis of C”. 
c) For any orthonormal basis X|,..., X, of C” the vectors AX,,..., AX, form 


an orthonormal basis of C”. 


Let T : V — V be a unitary linear transformation on V. Prove that there is an 
orthogonal basis of V consisting of eigenvectors of T. 
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10.8 The Spectral Theorem for Symmetric Linear 
Transformations and Matrices 


In this section we will prove the fundamental theorem concerning real symmetric 
matrices or linear transformations. This classifies the symmetric linear transforma- 
tions on an Euclidean space in the same way as Theorem 10.92 classifies orthogonal 
transformations. We will then use this theorem to prove the rather amazing result 
that any matrix A € M,,(R) is the product of a symmetric positive matrix and of 
an orthogonal matrix. This result, called the polar decomposition, is the matrix 
analogue of the classical result saying that any complex number can be written as 
the product of a nonnegative real number and of a complex number of magnitude 1. 

We start by establishing a first fundamental property of real symmetric 
matrices: their complex eigenvalues are actually real. 


Theorem 10.98. Let A € M,(R) be a symmetric matrix. Then all roots of the 
characteristic polynomial of A are real. 


Proof. Let À be a root of the characteristic polynomial of A. Let us see A as a 
matrix in M,(C). Since det(AJ,, — A) = 0, there exists X € C” nonzero such that 
AX = àX. Write X = Y + iZ for two vectors Y, Z € R” and write à = a + ib 
for some real numbers a, b. The equality AX = AX becomes 


AY +iAZ = (a +ib)\(Y +iZ) =aY —bZ+i(aZ+ bY) 
and taking real and imaginary parts yields 
AY =aY—-bZ, AZ=aZ+byY (10.8) 
Since A is symmetric, we have 
(AY, Z) = (Y, AZ) (10.9) 


By relation (10.8), the left-hand side of relation (10.9) is equal to a(Y, Z} —b||Z||’, 
while the right-hand side is equal to a(Y, Z} + b||Y||?. We deduce that 


2 2 
bY + |Z) = 0 
and since at least one of Y, Z is nonzero (otherwise X = 0, a contradiction), we 


deduce that b = 0 and å is real. oO 


We need one further preliminary remark before proving the fundamental theo- 
rem: 


Lemma 10.99. Let V be an euclidian space and let T : V —> V be a symmetric 
linear transformation on V. Let W be a subspace of V which is stable under T. 
Then 
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a) W+ is also stable under T. 
b) The restrictions of T to W and W+ are symmetric linear transformations on 
these spaces. 


Proof. This follows fairly easily from Problem 10.78, but we prefer to give a 
straightforward argument. 


a) Let x € WŁ and y € W. Then 


(T(x), y) = (x, T(y)). 


Now x € W+ and T(y) € T(W) c W, thus (x, T(y)) = 0 and so T(W+) c 
WŁ, which yields the desired result. 
b) Let T; be the restriction of T to W. For x, y € W we have 


(Ti (x), y) = (T(x), y) = (x, TY) = (x, T), 


thus Tı is symmetric as linear map on W. The argument being identical for W +, 
the lemma is proved. o 


We are finally in good shape for the fundamental theorem of the theory of 
symmetric linear transformations (or matrices), which shows that all such trans- 
formations are diagonalizable in an orthonormal basis: 


Theorem 10.100 (Spectral Theorem). Let V be an Euclidean space and let T : 
V — V be a symmetric linear transformation. Then there is an orthonormal basis 
of V consisting of eigenvectors for T. 


Proof. We will prove the theorem by strong induction on n = dim V. Everything 
being clear when n = 1, suppose that the statement holds up to n — 1 and let us 
prove it for n. So let V be Euclidean with dim V = n and let T be a symmetric 
linear transformation on V. Let e1,..., e, be an orthonormal basis of V. The matrix 
A of T in this basis is symmetric, hence it has a real eigenvalue A by Theorem 10.98 
(and the fact that any matrix with real-or complex-entries has a complex eigenvalue). 

Let W = ker(Aid — T) be the A-eigenspace of T. If W = V, then T = Aid and 
SO €1,..., €n is an orthonormal basis consisting of eigenvectors for T. So assume 
that dimW < n. We have V = W @ W+ and T leaves stable W+, inducing 
a symmetric linear transformation on this subspace (Lemma 10.99). Applying the 
inductive hypothesis to the restriction of T to W+ we find an orthonormal basis 
Is wary Je of W+ consisting of eigenvectors for T. Choosing any orthonormal 
basis fi,..., fs of W (consisting automatically of eigenvectors for T), we obtain 
an orthonormal basis fi, . sig Man peer of V = W@W? consisting of 
eigenvectors for T. This finishes the proof of the theorem. o 


If A € M,(R) is a symmetric matrix, then the linear transformation T : X > 
AX on V = R” is symmetric. Applying the previous theorem, we can find an 
orthonormal basis of V with respect to which the matrix of T is diagonal. Since the 
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canonical basis of V is orthonormal and since the change of basis matrix between 
two orthonormal bases is orthogonal (Remark 10.87), we obtain the following all- 
important result: 


Theorem 10.101. Let A € M,,(R) be a symmetric matrix. There exists an 
orthogonal matrix P € M,(R) such that PAP™ is diagonal (in particular A is 
diagonalizable). In other words, there is an orthonormal basis of R" consisting in 
eigenvectors of A. 


The next result gives a very useful characterization of positive (respectively 
positive definite) symmetric matrices: 


Theorem 10.102. Let A € M,(R) be a symmetric matrix. Then the following 
Statements are equivalent: 


a) A is positive 

b) All eigenvalues of A are nonnegative. 

c) A= B? for some symmetric matrix B € M, (R). 
d) A = 'B - B for some matrix B € M, (R). 


Proof. Suppose that A is positive and that À is an eigenvalue of A, with eigenvector 
v. Since Av = Àv, we obtain 


Alivi? = (v, Av) = ‘vAv > 0, 


thus A > 0. It follows that a) implies b). 

Assume that b) holds and let 41,...,A, be all eigenvalues of A, counted with 
multiplicities. By assumption A; > 0 for all i € [1,n]. Moreover, by the spectral 
theorem we can find an orthogonal matrix P such that PAP! = D, where D is the 
diagonal matrix with entries A,,...,4,. Let Dı be the diagonal matrix with entries 
hi = /A; and let B = P~!D,P. Then B is symmetric, since P is orthogonal and 
Dy, is symmetric: 


'B='PD,'P! = P D}P. 


Moreover, by construction B? = P~!D? P = P~!DP = A. Thus c) holds. 
It is clear that c) implies d). Finally, if d) holds, then for all X € R” we have 


'XAX = ||BX||? > 0 


and so A is positive. o 


The reader is invited to state and prove the corresponding theorem for positive 
definite matrices. 

After this hard work, we will take a break and see some nice applications of the 
above theorems. The result established in the next problem is very important. 
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Problem 10.103. a) Let T be a symmetric positive definite linear transformation 
on an Euclidean space V. Prove that for all d > 2 there is a unique symmetric 
positive definite linear transformation Tz such that Te = T. Moreover, prove 
that there is a polynomial P4 € R[X] such that Ta = Py(T). 

b) Let A € M,,(R) be a symmetric positive definite matrix. Prove that for all d > 
2 there is a unique symmetric positive definite matrix Ay such that Ag = Á. 
Moreover, there is a polynomial Pg € R[X] such that Ag = P4 (4A). 


Solution. Clearly part b) is a consequence of part a), so we focus on part a) only. 
Let us establish the existence part first. Since T is symmetric and positive definite, 
there are positive real numbers A,,..., A, and an orthonormal basis e1, ..., €n of V 
such that T (e;) = Aje; for 1 < i < n. Define Ty : V > V by Ta (ei) = Aie; for 
1 <i < n and extend it by linearity. Then T? (ei) = Shi e; = Àiei = T(e;) for 
1 <i <n. Thus T = T. Moreover, T4 is symmetric and positive definite: indeed, 
in the orthonormal basis e1, . . . , eg the matrix of Tz is diagonal with positive entries. 

Next, we prove that T4 is a polynomial in T. It suffices to prove that there is a 
polynomial P such that P(A;) = Yli for 1 <i <n, as then 


P(T)(ei) = P(i)e: = Yie; = Tales), 


thus P(T) = Ty. In order to prove the existence of P, let us assume without loss of 
generality that the different numbers appearing in the list A,,...,A, are À1,..., Àk 
for some 1 < k < n. It is enough to construct a polynomial P such that P(A;) = 
Vi; for 1 < i < k. Simply take the Lagrange interpolation polynomial associated 
with the data (A;,...,A,) and 4/Ay,..., SAk. 

Let us prove now that T4 is unique. Let S be a symmetric positive definite linear 
transformation such that S? = T. Then S commutes with T = S“, thus it also 
commutes with any polynomial in T. It follows from the previous paragraph that 
S commutes with T4. Since S and T4 are diagonalizable and since they commute, 
it follows that there is a basis fi,..., fa of V in which the matrices of S and Ty 
are both diagonal, say Dı and D2. Note that the entries a1, . . . , an and b;,...,b, of 
Dj, respectively D> are positive (since they are the eigenvalues of S and T4) and 
they satisfy af = b! for 1 <i < n (since S? = T$ = T). It follows that a; = b; 
for 1 <i < n and then Dı = D» and S = Ty. Thus Ty is unique. The problem is 
solved. o 


Remark 10.104. a) As the proof shows, the same result applies to symmetric 
positive (but not necessarily positive definite) linear transformations and matrices 
(of course, the resulting transformation T4, respectively matrix Ay will also be 
symmetric positive, but not necessarily positive definite). 

b) We will simply write YT, respectively <A for the linear transformation Ty, 
respectively matrix Ag in the previous problem. 


Consider now a matrix A € M,,(R). The matrix 'A- A is then symmetric and 
positive. By the previous problem (and the remark following it), there is a unique 
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symmetric positive matrix S = /‘A- A such that S? = 'A- A. Suppose now that 
A is invertible, then S is invertible (because ‘A - A = S°? is invertible) and so we 
can define 


U = AS", 


Then, taking into account that S is symmetric, we obtain 


‘U-U='S"' AAS = ST!S?ST! = I, 


that is U is orthogonal. We have just obtained half of the following important 


Theorem 10.105 (Polar Decomposition, Invertible Case). Let A € M,,(R) be an 
invertible matrix. There is a unique pair (S, U) with S a symmetric positive definite 
matrix and U an orthogonal matrix such that A = US. 


Proof. The existence part follows from the previous discussion, it remains to 
establish the uniqueness of U and S. Suppose that A = US with U orthogonal 
and S symmetric positive definite. Then 


‘A-A=S'U-US =S? 


and by the uniqueness part in Problem 10.103 we deduce that S = V‘A- A and 
then U = AS~!. Hence U and S are unique. o 


One may wonder what is happening when A = [a;;] is no longer invertible. 
We will prove that we still have a decomposition A = US with U orthogonal and 
S symmetric positive (not positive definite). The pair (S, U) is however no longer 
unique (if A = O,, then A = UO, for any orthogonal matrix U). The existence 
of the decomposition in the case when A is no longer invertible is rather tricky. 
We will consider the matrices A, = A + il. There exists kọ such that for all 
k > ko the matrix A, is invertible (because A has only finitely many eigenvalues). 
By the previous theorem applied to A; we can find an orthogonal matrix Ux and a 
symmetric positive definite matrix S such that 


Ay = Ux Sx. 


Write Up = [u\? ]and S; = i Since Uç% is orthogonal, the sum of squares of the 
elements in each column of U; equals 1, thus ae e [-1, 1] for alli, j € {1,...,n} 
and all k > ko. By a classical result in real analysis, any sequence of numbers 
between —1 and 1 has a convergent subsequence (this is saying that the interval 
[—1, 1] is compact). Applying this result n? times (for each pair i, j € {1,2,...n}) 


we deduce the existence of a sequence ky < kı < k2 <... such that 


(ki) 


Uij := lim Uj; 
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exists for alli, j € {1,2,...,m}. We claim that the matrix U = [u;;] is orthogonal. 
Indeed, passing to the limit in each entry of the equality ‘Up, - Ux, = In yields 
tU . U =1,,. Moreover, since 


Sk, = Uy! Aki = Un Ak 


and since each (i, j)-entry of ‘Up, converges (when l — œo) to uj; and each 
(i, j)-entry of Ax, converges (when / —> ov) to aij, we deduce that for all 


i,j € {1,2,...,m} the sequence (st); converges to some s;;, the matrix S = [s;;] 
is symmetric and 


S='U-A, 


that is A = US. It remains to check that S is positive, but if X € R”, then passing 
to the limit in the inequality XS}, X > 0 yields 'XSX > 0, thus S is positive. All 
in all, we have just proved the following: 


Theorem 10.106 (Polar Decomposition, The General Case). Any matrix A € 
M,,(R) can be written as the product of an orthogonal matrix and of a symmetric 
positive matrix. 


Note that if A = US, then necessarily 
‘A-A=S? 


and so S = V'A-A is uniquely determined. We call the eigenvalues of S the 
singular values of A. For more information about these, see the problems section. 
We end this section with a few other applications of the results seen so far. 


Problem 10.107. Let V be an Euclidean space and let T be a symmetric linear 


transformation on V. Let A1, ..., Àn be the eigenvalues of T. Prove that 
T(x 
IPOD = nar sh 
xev—(o} || Isi<n 
Solution. By renumbering the eigenvalues, we may assume that max; |A;| = |A,|. 
Let e),...,@, be an orthonormal basis of V in which T(e;) = A;e; for 1 <i <n. 


If x € V — {0}, we can write x = x,;e; +... + Xnen for some real numbers x;, and 
we have 


T(x) = ees 


i=l 
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Thus 


since A?x? < A2x? for 1 < i < n. We conclude that 


T 
ITM oy 
xev- II | 
Since 
I|7(en)I| 
TAT = Se [An 
[lenll 


we deduce that the previous inequality is actually an equality, which yields the 
desired result. o 


Problem 10.108. Find all nilpotent symmetric matrices A € M, (R). 


Solution. If A is nilpotent, then all eigenvalues of A are 0. If A is moreover 
symmetric, then it is diagonalizable and so it must be O,,. Thus only the zero matrix 
is simultaneously symmetric and nilpotent. o 


Problem 10.109. Let A be a symmetric matrix with real entries and suppose that 
A* = I, for some positive integer k. Prove that A? = I. 


Solution. Since A is symmetric and has real entries, its complex eigenvalues are 
actually real. Since they are moreover kth roots of unity, they must be +1. Thus all 
eigenvalues of A? are equal to 1. Since A? is symmetric, it is diagonalizable, and 
since all of its eigenvalues are 1, we must have =h. oO 


Problem 10.110. Let A € M,,(R) be a symmetric positive matrix. Prove that 
i 1 
Vdet A < —Tr(A). 
n 


Solution. det A and Tr(A) do not change if we replace A with any matrix similar 
to it. Using the spectral theorem, we may therefore assume that A is diagonal. Since 
A is positive, its diagonal entries a; := aj; are nonnegative numbers. It suffices 
therefore to prove that 


ai +42 +... + an 
n 


«4/4142 ...an < 


for all nonnegative real numbers a1, ..., an. This is the AM-GM inequality. Let us 
recall the proof: the inequality is clear if one of the a;’s is 0. If all a; are positive, the 
inequality is a consequence of the convexity of x +> e* (more precisely of Jensen’s 
inequality applied to Ina;,...,Ina,). oO 
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Problem 10.111. Let A = [aij] € M,(R) be a symmetric positive matrix with 
eigenvalues A,,...,4,. Prove that if f : [0,co) — R is a convex function, then 


FS (ai) + faz) +... + flann) < fA) +... + fn). 


Solution. Since A is symmetric and positive, there is an orthogonal matrix P such 
that A = PDP™, where D is the diagonal matrix with diagonal entries 11,..., A. 
Let P = [p;;], then the equality A = PD’ P yields 


n 


aij = >. Pik Ak D jk: 
k=1 


Since P is orthogonal, we have `% —; P, = | for alli, and since f is convex, we 
deduce that 


Saui) = f (> pà) <) ph fO. 
k=1 k=1 


Adding up these inequalities yields 


n n 


Y f@ds> Y fA) = 


i=l i=l k=1 


Yo FAD DS pie = D5 FAK), 
k=1 k=1 


i=l 


the last equality being again a consequence of the fact that P is orthogonal. The 
result follows. o 


Problem 10.112. Let A = [a;;] € M, (R) be a symmetric positive matrix. Prove 
that 


det A < ajax... Ann- 


Solution. If det A = 0, then everything is clear, since a;; =' e; Ae; > 0 for all i, 
where e1, ..., €n is the canonical basis of R”. So suppose that det A > 0, thus A is 
positive definite. Then a;; > 0, since e; Æ 0. If A,,...,A, are the eigenvalues of A, 
then det A = A, ...A,, thus the inequality is equivalent to 


y log Àk < 2 log akk. 
k=1 k=1 


This follows from Problem 10.111 applied to the convex function f(x) = — log x. 
oO 
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Problem 10.113 (Hadamard’s Inequality). Let A = [a;;] € M,(R) be an 
arbitrary matrix. Prove that 


| det A|? < Il D 


i=1 \j=1 


Solution. We will apply Problem 10.112 to the matrix B = A'A, which is 
symmetric and positive. Note that det B = (det A)? and bj; = P= aj, for all i. 
The result follows therefore from Problem 10.112. oO 


10.8.1 Problems for Practice 


1. Give an example of a symmetric matrix with complex coefficients which is not 
diagonalizable. 

2. Let T be a linear transformation on an Euclidean space V, and suppose that 
V has an orthonormal basis consisting of eigenvectors of T. Prove that T is 
symmetric (thus the converse of the spectral theorem holds). 

3. Consider the matrix 


1 —2 -2 
A= | -2 1 -2 
—2 —2 1 


a) Explain why A is diagonalizable in M3(R). 
b) Find an orthogonal matrix P such that P~!AP is diagonal. 


4. Find an orthogonal basis consisting of eigenvectors for the matrix 


—2 6-3 
A=-=| 632 
—32 6 


5. Let A € M,,(R) be a nilpotent matrix such that A*A = ‘AA. Prove that A = 
On. Hint: prove that B = A‘ A is nilpotent. 

6. Let A € M,,(R) be a matrix. Prove that A’ A and‘ AA are similar (in fact A and 
t A are always similar matrices, but the proof of this innocent-looking statement 
is much harder and requires Jordan’s classification theorem). Hint: both these 
matrices are symmetric, hence diagonalizable. 

7. Let A E€ M,(R) be a symmetric matrix. Prove that 


(Tr(A))* 


Hint: consider an orthonormal basis of eigenvectors for A. 
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8. The entries of a matrix A € M,,(R) are between —1 and 1. Prove that 
| det A| < n"/”. 


Hint: use Hadamard’s inequality. 

9. Let A,B € M,,(R) be matrices such that ‘AA = ‘BB. Prove that there is 
an orthogonal matrix U € M,,(R) such that B = UA. Hint: use the polar 
decomposition. 

10. (The Courant—Fischer theorem) Let E be an Euclidean space of dimension n 
and let p € [1,7] be an integer. Let T be a symmetric linear transformation on 
E and let A, <... <A, be its eigenvalues. 


a) Let e;,...,@, be an orthonormal basis of E such that T(e;) = A;e; for all 


1 <i <n and let F = Span(e,...,e,). Prove that 


max (T(x), x) < Àp. 
xE 
Ilx||=1 


b) Let F be a subspace of E of dimension p. Prove that F N Span(ep,..., €n) 
is nonzero and deduce that 


max (T(x), x} > Àp. 
xE 
I|x|[=1 


c) Prove the Courant—Fischer theorem: 


Àp = min max (T(x), x), 
xE 
dim F =p ||x||=1 


the minimum being taken over all subspaces F of E of dimension p. 


11. Find all matrices A € M,(R) satisfying A'AA = [,,. Hint: start by proving that 
any solution of the problem is a symmetric matrix. 
12. Find all symmetric matrices A E€ M,,(R) such that 
A+ A? +A? =3],. 


13. Let A, B € M,(R) be symmetric positive matrices. 


a) Let e;,...,e, be an orthonormal basis of R” consisting of eigenvectors of 
B, say Be; = Aje;. Let y; = (Ae;,e;). Explain why 4;, ui > 0 for alli 
and why 


Tr(A) = X ui and Tr(AB) = Xodimi. 
i=l 


i=l 
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b) Prove that 


Tr(AB) < Tr(A)- Tr(B). 


14. Let A = [ajj] € M,(R) be a symmetric matrix and let A1,...,An be its 


15. 


16. 


17. 


18. 


19. 


eigenvalues (counted with multiplicities). Prove that 


(Cholesky’s decomposition) Let A be a symmetric positive definite matrix in 
M,,(R). Prove that there is a unique upper-triangular matrix T € M,,(R) with 
positive diagonal entries such that 


A='T.-T. 


Hint: for the existence part, consider the inner product (x, y); = (Ax, y} on R” 
(with ( , ) the canonical inner product on R”), apply the Gram-Schmidt process 
to the canonical basis 6 of R” and to the inner product ( ,);, and consider the 
change of basis matrix from BG to the basis given by the Gram-Schmidt process. 
a) Let V be an Euclidean space and let T be a linear transformation on V. Let 


A1,...,An be the eigenvalues of T* o T. Prove that 
TOON _ i 
xev—{oy [141 Isisn ©" 


b) Let V be an Euclidean space and let T be a symmetric linear transformation 
on V.LetA, <... < Àn be the eigenvalues of T. Prove that 


(T(x), x) 


xEV—{0} ||] |? i 


Let A, B € M, (R) be symmetric matrices. Define a map f : R > R by: f(t) 
is the largest eigenvalue of A + t B. Prove that f is a convex function. Hint: use 
Problem 16. 

Let T be a diagonalizable linear transformation on an Euclidean space V. Prove 
that if T and T* commute, then T is symmetric. 

Let V be the vector space of polynomials with real coefficients whose degree 
does not exceed n, endowed with the inner product 


1 
(P, Q) =f P(x)O(x)dx. 
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20. 


21. 


22. 


23. 
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Consider the map T : V — V defined by 


1 
T(P\(x) = f (X +1)" P(ndt. 


a) Give a precise meaning to T(P)(X) and prove that T is a symmetric linear 
transformation on V. 


b) Let Po,..., P, be an orthonormal basis of V consisting of eigenvectors for 
T, with corresponding eigenvalues Ao,...,A,. Prove that for all x,y € R 
we have 


a =A Pe) PG). 
k=0 


Prove that if A, B are symmetric positive matrices in M, (R), then 


det(A + B) > det A + det B. 


a) Prove that if x;,...,x, are real numbers and A,,...,A, are positive real 


numbers, then 
n n n 2 
+) (Eo) (Es). 
i=l i=1 i=l 


b) Prove that if T is a symmetric and positive definite linear transformation on 
an Euclidean space V, then for all x € V we have 


(T(x), x} (T1, x) > lor *. 


a) Prove that if A;,...,A, are nonnegative real numbers, then 


YA +A)... AHA) Z 1 Vài... An. 


Hint: check that the map f(x) = In(1 + e”) is convex on [0, oo) and use 
Jensen’s inequality. 
b) Let A € M,,(R) be a symmetric positive definite matrix. Prove that 


vy det(I, + A) > 1+ Vdet A. 


(Singular value decomposition) Let A,,...,A, be the singular values of A € 
M,,(R), counted with multiplicities (algebraic or geometric, it does not matter 
since S is diagonalizable). 
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24. 


a) Prove the existence of orthonormal bases e1,..., en and fi,..., fa of R” 
such that Ae; = A; fi for 1 < i < n. Hint: let A = US be the polar 
decomposition of A. Pick an orthonormal basis e;,...,e, of R” such that 
Se; = A;e; and set f; = Ue;. 

b) Prove that ife;,...,e, and f{,..., fn are bases as in a), then for all X € R” 
we have 


Ava a Gres 


i=l 


We call this the singular value decomposition of A. 

c) Let e1,..., €n and fi, ..., fa be orthonormal bases of R” giving a singular 
value decomposition of A. Prove that the singular value decomposition of 
A`! is given by 


T. n 1 
ax => z% fej. 
i=] J 


d) Prove that two matrices A}, 4, € M,(R) have the same singular values if 
and only if there are orthogonal matrices U,, U> such that 


A2 = U, A, U2. 


e) Prove that A is invertible if and only if 0 is not a singular value of A. 

f) Compute the rank of A in terms of the singular values of A. 

g) Prove that A is an orthogonal matrix if and only if all of its singular values 
are equal to 1. 


The goal of this long exercise is to establish the analogues of the main results 
of this section for hermitian spaces. 

Let V be a hermitian space, that is a finite dimensional C-vector space 
endowed with a hermitian inner product ( , ). A linear transformation T:V >V 
is called hermitian if (T (x), y) = (x, T(y)) forall x,y € V. 


a) Let e;,...,@, be an orthonormal basis of V. Prove that T is hermitian if 
and only if the matrix A of T with respect to e;,...,e, is hermitian, that is 
A = A* (recall that A* = A). 

From now on, until part e), we let T be a hermitian linear transformation 
on V. 

b) Prove that the eigenvalues of T are real numbers. 

c) Prove that if W is a subspace of V stable under T, then W+ is also stable 
under T, and the restrictions of T to W and WH are hermitian linear 
transformations on these subspaces. 

d) Prove that there is an orthonormal basis of V consisting of eigenvectors of T. 
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e) Conversely, prove that if V has an orthonormal basis consisting of eigenvec- 
tors of T with real eigenvalues, then T is hermitian. 

f) Prove that for any hermitian matrix A € M,,(C) we can find a unitary matrix 
P and a diagonal matrix D with real entries such that A = P~'DP. 

g) Let T : V — V be any invertible linear transformation. Prove that there is a 
unique pair (S, U) of linear transformations on V such that H is hermitian 
positive (i.e., H is hermitian and its eigenvalues are positive), U is unitary 
andT =Uod. 


Chapter 11 
Appendix: Algebraic Prerequisites 


Abstract This appendix recalls the basic algebraic structures that are needed in the 
study of linear algebra, with special emphasis on permutations and polynomials. 


Even though the main objects of this book are vector spaces and linear maps 
between them, groups and polynomials naturally appear at several key moments 
in the development of linear algebra. In this brief chapter we define these objects 
and state the main properties that will be needed in the sequel. The reader is advised 
to skip reading this chapter and return to it whenever reference to this chapter is 
made. 


11.1 Groups 


Morally, a group is just a set in which one can multiply objects of the set (staying 
in that set) according to some rather natural rules. Formally, we have the following 
definition. 


Definition 11.1. A group is a nonempty set G endowed witha map-:GxG —> G 
satisfying the following properties: 


a) (associativity) For all a,b,c € G we have (a-b)-c =a- (b-c). 
b) (identity) There is an element e € G such thata -e = e -a =a foralla E G. 
c) (existence of inverses) For all a € G there is a7! € G such that a - a7! = 
a ase. 
If moreover a-b = b-a foralla,b € G, we say that the group G is commutative 
or abelian. 


Note that the element e of G is unique. Indeed, if e’ is another element with the 
same properties, then e’ = e’-e = e - e' = e. We call e the identity element of G. 
Secondly, the element a~! is also unique, for if x is another element with the same 
properties, then 


x=x-e=x-(a-a')=(x-a)-a!=e-a!'=al. 


We call a7! the inverse of a. 
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We will usually write ab instead of a - b. Moreover, if the group G is abelian, we 
will usually prefer the additive notation a + b instead of ab and write 0 instead of 
e, and —a instead of a7!. 

Since the definition of a group is not restrictive, there is a huge amount of 
interesting groups. For instance, all vector spaces (which we haven’t properly 
defined yet, but which are the main actors of this book) are examples of commutative 
groups. There are many other groups, which we will see in action further on: groups 
of permutations of a set, groups of invertible linear transformations of a vector space, 
the group of positive real numbers or the group of integers, etc. 


11.2 Permutations 


11.2.1 The Symmetric Group Sn 


A bijective mapo : {1,2,...,} —> {1,2,...,m} is called a permutation of degree 
n. We usually describe a permutation by a table 


= 1 De oe 
= E o(2)... oy i 
where the second line represents the images of 1,2,...,n by o. 
The set of all permutations of degree n is denoted by S,,. It is not difficult to see 
that S,, has n! elements: we have n choices for ø (1), n — 1 choices for o (2) (as it can 
be any element different from o(1)),..., one choice for o (n), thusn-(n—1)-...- 


1 = n! choices in total. 
We denote by e the identity map sending k to k for 1 < k < n, thus 


bie 12...n 
NL Dae dE 
The product ot of two permutations o,t € S, is defined as the composition 
o ot. Thus foralll <k <n 


(ot )(k) = o (t(k)). 


Example 11.2. Let o,t € S4 be the permutations given by 


sal DOAN, eee olf LOA 
2341 T3142)" 
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Pee 1234\(1234\_ (1234 
~(2341/\3142) \4213 
one 1234\/1234)\ ./i234 
~\3142/\2341)/ \1423 


Since ø and t are bijections, so is their composition and so ot € S,. The easy 
proof of the following theorem is left to the reader. 


Then 


and 


Theorem 11.3. Endowed with the previously defined multiplication, S,, is a group 
with n! elements. 


Note that the inverse of a permutation with respect to multiplication is simply 
its inverse as a bijective map (i.e., o7! is the unique map such that o™!(x) = y 
whenever o(y) = x). For example, the inverse of permutation 


Pa 12345 
24513 


ota 12345 
~ \41523)° 
The previous Example 11.2 shows that we generally have o - t Æ t -o, thus Sn 
is anon commutative group in general (actually for all n > 3, the groups Sı and S2 


being commutative). The group S, is called the symmetric group of degree n or 
the group of permutations of degree n. 


is the permutation 


Problem 11.4. Leto € S,, where n > 3. Prove that ifo-a@ = a-o for all 
permutations a € S,, then o = e. 


Solution. Fixi € {1,2,...,} and choose a permutation a having i as unique fixed 
point, for instance 
oe 12...d-liit+l...n 
o (23...i+lii+2...1]" 
Since 


o (i) = o(a(i)) = a(o (i)) 


and i is the unique fixed point of œ, we must have o (i) = i. As i was arbitrary, the 
result follows. 
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11.2.2 Transpositions as Generators of Sn 


The group S,, has a special class of elements which have a rather simple structure 
and which determine the whole group S,,, in the sense that any element of S, is a 
product of some of these elements. They are called transpositions and are defined 
as follows. 


Definition 11.5. Let i, j € {1,2,...,m} be distinct. The transposition (ij) is the 
permutation o sending k tok for all k Æ i, j and for which o (i) = j ando(j) =i. 
Thus (ij) exchanges i and j, while keeping all the other elements fixed. 


It follows straight from the definition that a transposition t satisfies t? = e and 
sot! = t. Note also that the set {i, j } is uniquely determined by the transposition 
(ij), since it is exactly the set of those k € {1,2,...,} for which (ij )(k) # k. 


Since there are 
n\ _n(n—1) 
2) 2 


subsets with two elements of {1,2,...,}, it follows that there are (5) transposi- 
tions. Let us prove now that the group S, is generated by transpositions. 


Theorem 11.6. Letn > 2. Any permutation o € S, is a product of transpositions. 


Proof. For o € S, we let mo be the number of elements k € {1,2,...,n} for 
which o(k) 4 k. We prove the theorem by induction on mo. If mz = 0, then 
o = e = (12)? and we are done. 

Assume that mo > 0 and that the statement holds for all permutations a € S, 
with my < Mo. Since mz > 0, there isi € {1,2,...,n} such that o (i) Æ i. 
Let j = o(i), t = (ij) and œ = ot. Let A = {k,a(k) # ky and B = {k,o(k) # 
k}. Note that if o (k) = k, then k +Æ i and k ¥ j, hence 


a(k) = (ot)(k) = o (t(k)) = o (k) = k. 


This shows that A C B. Moreover, we have A Æ B since j belongs to B but not to 
A. It follows that my < mo. 

Using the induction hypothesis, we can write œ as a product of transpositions. 
Since o = at~! = art, Ø itself is a product of transpositions and we are done. 


Note that the proof of the theorem also gives an algorithm allowing to express a 
given permutation as a product of transpositions. Let us see a concrete example. Let 


gic 12345 
~\25413/)° 
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Since o(1) = 2, we compute o - (12) in order to create a fixed point 


12345) (12345 
Sree) Celery. 


— (12345 
52413)" 
Because o1 (1) = 5, we compute g; - (15) to create a new fixed point 


Spee = 12345 12345 
‘2a MEDALS AS93441 

— (12345 

~A32415)° 
Computing 02(13) we obtain a new fixed point in the permutation 


os = 0x03) = (42348). 


42315 


Now, observe that o3 = (14), thus 03 - (14) = e. We deduce that o - (12) - (15) - 
(13) - (14) = e and so 


o = (14)(13)(15)(12). 


11.2.3 The Signature Homomorphism 


An inversion of a permutation o € S, is a pair (i, j) with 1 <i < j < n and 
o(i) > o(/). Let Inv(c) be the number of inversions of o. Note that 
n(n — 1) 


0 < Inv(o) < Sag we Os Sn, 


and these inequalities are optimal: Inv(e) = 0 and Inv(0o) = aint) for 
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Example 11.7. The permutation 
ie 123456 
~ (562143 
has Invio) = 4+ 4+ 1+ 1 = 10 inversions, since o(1) > o(3), 001) > o(4), 


o(1) > 0 (5), o(1) > 0 (6), o(2) > o (3), o (2) > o (4), o (2) > (5), o (2) > 0 (6), 
a(3) > a(4), 0 (5) > a (6). 


We introduce now a fundamental map € : S,, > {—1, 1}, the signature. 


Definition 11.8. The sign of a permutation o € S, is defined by 
e(o) = (-1)'™. 


If e(o) = 1, then we say that o is an even permutation and if e(o) = —1, then we 
say that o is an odd permutation. Note that a transposition t = (ij) with? < j isan 
odd permutation, as the number of inversions of t is j—i + j —i—1 = 2(j —i)-1. 

Here is the fundamental property of the signature map: 


Theorem 11.9. The signature map € : S, — {—1, 1} isa homomorphism of groups, 
ie., €(0102) = £€(01)e(02) for all 01, 02 € Sp. 


Without giving the formal proof of this theorem, let us mention that the key point 
is the equality 


o(i)-0(j) 
wS ae 
l<i<j<n y J 
for any o € S,„. This follows rather easily from the definition of e(o) and can be 
used to prove the multiplicative character of o. 


Remark 11.10. a) The signature is the unique nontrivial homomorphism S, —> 
{—1, 1}. Indeed, let g : S, —> {—1, 1} be a surjective homomorphism of groups. 
If tı = (i, j) and t = (k,l) are two transpositions, then we can find o € S, 
such that t2 = ot\07 |! (indeed, it suffices to impose o(i) = k and o(j) = I). 
Then (12) = o(0)y(t1) (0)! = (tı). Thus all transpositions of S, are sent 
to the same element of {—1, 1}, which must be —1, as the transpositions generate 


S, and g is not the trivial homomorphism. Thus g(t) = —1 = e(t) for all 
transpositions and using again that the transpositions generate S,,, it follows that 
Q =E. 

b) Leto € S, be a permutation, and write 0 = T1T2 ... Tk, where T1, T2, ..., Tk are 


transpositions. This decomposition is definitely not unique, but the parity of k is 
the same in all decompositions. This is definitely not an obvious statement, but it 
follows easily from the previous theorem: for any such decomposition we must 
have e(o) = H e(t;) = (—1)*, thus the parity of k is independent of the 
decomposition. 
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11.3 Polynomials 


Let F be a field, for instance R or C. The set F [X] of polynomials with coefficients 
in X will play a key role in this chapter. In this section we recall, without proof, a 
few basic facts about polynomials. 

Any element P of F[X] can be uniquely written as a formal expression 


P =aọo+aıX +...+a,X" 


with do,...,d, € F. If P Æ 0, then at least one of the coefficients ao, ..., an is 
nonzero, and we may assume that a, # 0. We then say that P has degree n (and 
write deg P = n) and leading coefficient a„. By convention, the degree of the 
zero polynomial is —oo. A fundamental property of polynomials with coefficients 
in a field is the equality 


deg(PQ) = deg P + deg Q 


for all polynomials P,Q € F[X]. We say that P is unitary or monic if its 
leading coefficient is 1. Polynomials of degree 0 or —oo are also called constant 
polynomials. 


Remark 11.11. Sometimes we will write P(X) instead of P for an element of 
F [X], in order to emphasize that the variable is X. 


The first fundamental result is the division algorithm: 


Theorem 11.12. Let A,B € F[X] with B + 0. There is a unique pair (Q, R) of 
elements of F |X] such that A= BQ + R and deg R < deg B. 


The polynomials Q and R are called the quotient, respectively remainder of A 
when divided by B. We say that B divides A if R = 0. We say that a polynomial 
P € F[X] is irreducible if P is not constant, but cannot be written as the product 
of two nonconstant polynomials. Thus all divisors of an irreducible polynomial are 
either constant polynomials or constant times the given polynomial. For instance, 
all polynomials of degree 1 are irreducible. For some special fields, these are the 
only irreducible polynomials: 


Definition 11.13. A field F is called algebraically closed if any irreducible 
polynomial P € F[X] has degree 1. 


An element a € F is called a root of a polynomial P € F[X] if P(a) = 0. 
In this case, the division algorithm implies the existence of a factorization 


P = (X —a)Q 
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for some polynomial Q e F[X]. Repeating this argument, we deduce that if 
a\,42,...,a% E F are pairwise distinct roots of P, then we can write 


P= (X —a,)(X — an)... (X —ax)O 


for some polynomial Q € F[X]. Taking degrees, we obtain the following 


Theorem 11.14. A nonzero polynomial P € F[X] of degree n has at most n 
pairwise distinct roots in F. 


Stated otherwise, if a polynomial of degree at most n vanishes at n + 1 distinct 
points of F, then it must be the zero polynomial. The notion of irreducible 
polynomial can also be expressed in terms of roots: 


Theorem 11.15. A field F is algebraically closed if and only if any nonconstant 
polynomial P € F[X] has a root in F. If this is the case, then any nonconstant 
polynomial P € F[X] can be written as 


P=c(X —aq,)"'...(X — a,x)" 
for some nonzero constant c € F, some pairwise distinct elements ay, ..,ax of F 


and some positive integers nj,...,Nk. 


We call n; the multiplicity of the root a; of P. It is the largest positive integer m 
for which (X — a;)” divides P. 
Finally, we state the fundamental theorem of algebra: 


Theorem 11.16 (Gauss). The field C of complex numbers is algebraically closed. 
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